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LARGE DEVIATION ASYMPTOTICS FOR OCCUPANCY 

PROBLEMS 1 

By Paul Dupuis 2 , Carl Nuzman and Phil Whiting 
Brown University, Bell Labs and Bell Labs 

In the standard formulation of the occupancy problem one con- 
siders the distribution of r balls in n cells, with each ball assigned 
independently to a given cell with probability 1/n. Although closed 
form expressions can be given for the distribution of various inter- 
esting quantities (such as the fraction of cells that contain a given 
number of balls), these expressions are often of limited practical use. 
Approximations provide an attractive alternative, and in the present 
paper we consider a large deviation approximation as r and n tend to 
infinity. In order to analyze the problem we first consider a dynam- 
ical model, where the balls are placed in the cells sequentially and 
"time" corresponds to the number of balls that have already been 
thrown. A complete large deviation analysis of this "process level" 
problem is carried out, and the rate function for the original prob- 
lem is then obtained via the contraction principle. The variational 
problem that characterizes this rate function is analyzed, and a fairly 
complete and explicit solution is obtained. The minimizing trajec- 
tories and minimal cost are identified up to two constants, and the 
constants are characterized as the unique solution to an elementary 
fixed point problem. These results are then used to solve a number of 
interesting problems, including an overflow problem and the partial 
coupon collector's problem. 

1. Introduction. Urn occupancy problems center on the distribution of 
r balls in n cells, typically with each ball independently assigned to a given 
cell with probability 1/n. The literature on the general topic is enormous. 
See, for example, [9, 19, 20] and the references therein. 
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There are many different questions one can pose. For example, it may 
be that one is interested in the distribution of (To,ri, . . . ), where Ti is the 
number of cells containing exactly i balls after all r balls have been thrown. 
In the classical occupancy problem [9, 14, 20], one is interested only in the 
distribution of unoccupied urns Tq. In other cases, one might be interested in 
the (random) number of balls required to fill all cells to a given level, or the 
number required so that a given fraction are filled to that level — the so called 
coupon collector's or dixie cup problem. In biology the inverse problem of 
estimating the number of balls thrown from the number of occupied cells also 
arises and is used to estimate species abundance [17]. Related applications 
in biology appear in [2, 3], and an application in computer science appears 
in [21]. 

Another related problem of interest is the overflow problem, in which the 
urns are supposed to have a finite capacity C, and the number of balls that 
overflow is the random variable of interest. Ramakrishna and Mukhopad- 
hyay [24] describes an application in computer science concerned with mem- 
ory access, and [12] considers an application to optical switches. In [12] one 
is concerned with dimensioning the number of wavelength converters so as 
to reduce the probability of packet loss across the switch to an acceptable 
level. In [12] any-color-to-any-color converters are considered. However, by 
extending the results proved here to the case where the balls have distinct 
colors, dimensioning in the case where we have many-to-one color converters 
can be carried out and the blocking probability of the output estimated. 

A wide range of results have been proved for the occupancy problem 
by using "exact" approaches. For example, combinatorial methods are used 
in [9, 14, 20], and methods that utilize generating functions are discussed 
in [20]. The implementation of these results, however, can be difficult. For ex- 
ample, in applying the combinatorial results one must compute the difference 
between large quantities that appear in the inclusion-exclusion formula for 
the probability that a given fraction of cells are occupied. An analogous dif- 
ficulty occurs with techniques based on moment generating functions, since 
one must invert the generating function itself. 

Asymptotic methods provide an attractive alternative to both of these ap- 
proaches. One reason is that they often offer good approximations with only 
a modest computational effort. A second, perhaps more important reason, 
is that superior qualitative insights can often be obtained. Indeed, a range 
of asymptotic results have already been obtained for these problems (see, 
e.g., [18]). The first large deviations principle (LDP) for urn problems that 
we are aware of was established in [28] for the special case of the classical 
occupancy problem. This result was applied in [21] to a boolean satisfiabil- 
ity problem in computer science. Reference [6], which appeared while the 
present paper was under review, proves a LDP for the infinite-dimensional 
occupancy measures associated with occupancy processes. The present paper 
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focuses on finite-dimensional occupancy measures in which urn occupancies 
above a given level are not distinguished. In this finite case, we are able to 
provide a concise large deviations proof along with explicit, insightful and 
computable expressions for the rate functions and for the large deviations 
extremals. The rate function for the occupancy model after all the balls have 
been thrown is shown to have a simple and fairly explicit rate function, which 
can be defined in terms of relative entropy with respect to the Poisson dis- 
tribution. Many different problems can be solved in this framework simply 
by changing the set over which the rate function is minimized. We also give 
sample path results for the evolution of the urn occupancies toward a partic- 
ular event. In principle, the explicit form of the minimizing trajectories for 
sample path results should enable accurate empirical estimates of unlikely 
events using importance sampling; see [7, 22]. 

Let [a\ denote the integer part of a scalar a. With n cells available and 
a total of r = \_/3n\ balls to be thrown (with > 0), we consider the large 
deviation asymptotics as n — > oo. The precise statement is as follows. Fix 
a positive integer /. Then with Tf(f3) equal to the fraction of cells contain- 
ing i balls, we characterize the large deviation asymptotics of the random 
vectors {(T^(P),T^(P), . . . ,TJ(p)),n = 1, 2, . . . } as n -»• oo. A direct analysis 
of this problem is difficult, and, in fact, it turns out to be simpler to first 
"lift" the problem to the level of a sample path large deviation problem, and 
then use the contraction principle to reduce to the original finite-dimensional 
problem. A "time" variable x is introduced into the problem, where \nx\ 
balls have been thrown at time x, and Tf(x) is equal to the fraction of cells 
containing i balls at this time. We then follow a standard program: the large 
deviation properties of this Markov process are analyzed, the rate function J 
on path space is obtained, and the rate function for the occupancy at time (3 
is then characterized as the solution to a variational problem involving J. 

Although the program is standard, there are several very interesting 
features, both qualitative and technical, which distinguish this large de- 
viation problem. We first describe some of the attractive qualitative fea- 
tures. Typically, one has a rate function on path space of the form J(4>) = 
J L((f>(x),<f>(x)) dx, where the nonnegative function L(-y,^) is jointly lower 
semi-continuous and convex in £ for each fixed 7. The large deviation prop- 
erties of the process at time (5 are then found by solving a variational problem 
of the form 

mf{J((f)): (f)(0) =u}, 

where u is given and where there will also be constraints on the initial con- 
dition (f)(0). In general, this problem does not have an explicit, closed form 
solution. One exception to this rule is the extraordinarily simple situation 
where L(7,£) does not depend on the state variable 7. In this case, Jensen's 
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inequality implies that the minimizing trajectory is a straight line, and so 
the variational problem is actually finite dimensional. Another exception is 
the case of large deviation asymptotics of a small noise linear stochastic dif- 
ferential equation. In this case the variational problem takes the form of the 
classical linear quadratic regulator, and the explicit solution is well known 
from the theory of deterministic optimal control. However, in this case there 
is no need to "lift" the problem to the sample path level, since the distribu- 
tion of the diffusion at any time is Gaussian with explicitly calculable mean 
and covariance. Other exceptions occur in large deviation problems from 
queuing theory, but in these problems the variational integrand is "section- 
ally" independent of 7, and one can show that the minimizing trajectories are 
the concatenation of a finite number of straight line segments (see, e.g., [1] 
and the references therein). 

For the occupancy problem the variational integrand has a complicated 
state dependence [see (2.2)], reflecting the complicated dependence of the 
transition probabilities (in the process level version) on the state. Nonethe- 
less, the rate function possesses a great deal of structure that can be heavily 
exploited. For example, the function J turns out to be strictly convex on 
path space, and so all local minimizers in the variational problem are actu- 
ally global minimizers. Perhaps even more surprising is the fact that explicit 
solutions to the Euler-Lagrange equations can be constructed, and as a con- 
sequence, the variational problem can be more-or-less solved explicitly (see 
the Appendix). Both of these properties follow from the fact that the vari- 
ational integrand L can be defined in terms of the famous relative entropy 
function (or divergence). 

A technical novelty of the problem is the singular behavior of the tran- 
sition probabilities of the underlying Markov process. Since only cells con- 
taining j balls can become cells with j + 1, it is clear that the transition 
probability corresponding to such an event scales linearly with T"(x), and 
in particular, that it vanishes at the boundary of the state space, when 
T"(x) = 0. This poses no difficulty for the large deviations upper bound, but 
is an obstacle for the lower bound. (Some results that address lower bounds 
when rates go to zero include Chapter 8 of [26] , which treats processes with 
"flat" boundaries, and recent general results in [27].) For the occupancy 
model, existing results provide a lower bound for open sets of trajectories 
that do not touch the boundary, because away from the boundaries the set 
{£:L(r,£) < 00} is independent of T. To deal with more general open sets 
we use a perturbation argument. We first show, using the strict convexity 
of J and properties of the zero cost paths, that it is enough to prove large de- 
viation lower bounds for open neighborhoods of trajectories that stay away 
from the boundary for all positive times. (Note that this still allows the 
trajectory to start on the boundary.) Loosely speaking, to prove the lower 
bound for sets of this type it is enough to show that given a > 0, there is 
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b > such that the probability that the process is at least distance b from 
the boundary by time b is bounded below by exp -no. It turns out that these 
bounds can be easily established by exploiting an explicit representation for 
such probabilities that was proved in [11]. 

An outline of the paper is as follows. Section 2 states the main results 
of the paper. In the first part of Section 2 we construct the underlying 
stochastic process model, and state the corresponding sample path level 
LDP, as well as the LDP for the terminal distribution. The proof of the 
sample path LDP is given in the following section, although in Section 2 
an additional heuristic argument for the form of the local rate function 
based on Sanov's theorem is provided. In the second part of Section 2, the 
terminal distribution rate function is characterized and explicit expressions 
for the sample path minimizers are presented. These constitute more-or-less 
complete solutions to the corresponding calculus of variations problem. The 
detailed proofs of these latter results are deferred to the Appendix. 

Section 3 gives a proof of the sample path LDP. The solutions to the 
variational problem are exemplified in Section 4, where we show how specific 
questions regarding the occupancy problem can be answered. In particular, 
we work out the asymptotics for a number of examples, including an overflow 
problem and a partial coupon collection problem. Generalizations to our 
results are also described. 

2. Main results. 

2.1. Statement of the LDP. In this section we formulate the problem of 
interest and state the LDP. The proof will be given in Section 3. As noted 
in the Introduction, our focus is the asymptotic behavior of the occupancy 
problem. If n denotes the total number of cells, then to have a nontrivial 
limit, the number of balls placed in the cells should scale linearly with n. 
We will place \_(3n\ balls in the cells, where (3 € (0, oo) is a fixed parameter 
and [a] denotes the integer part of a. 

As will be seen in the sequel, the large deviation asymptotics of the oc- 
cupancy should be treated by first lifting the problem to the level of sample 
path large deviations, and then using the contraction principle to reduce to 
the original problem. To do this, we introduce a "time" variable x that ranges 
from to (5. At time x, one should imagine that [nx\ balls have been thrown. 
Thus, the occupancy process will be piecewise constant over intervals of the 
form [i/n,i/n + l/n). With this scaling of time, large deviation asymptotics 
can be obtained if we scale space by a factor of 1/n. Thus, we define the ran- 
dom occupancy process T n (x) = (T"$(x), . . . , Vj(x), r™ + (x)) by letting Tf (x), 
i = 0, . . . ,1 denote 1/n times the number of cells with exactly i balls at 
time x, and letting Vj + (x) be 1/n times the number of cells with more than / 
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balls at time x. Note that T n takes values in the set of probability vectors 
on I + 2 points: Si = {7 € M I+2 : jj > 0, < j < I + 1 and J2jtl Ij = !}■ 

The process {r n (z/n),i = 0, 1, . . .} is obviously Markovian. It will be con- 
venient to work with the following "dynamical system" representation: 

' i ±l) =T n(i} 1 

n J \n / 

where the independent and identically distributed random vector fields {&»(• 
i = 0,l,...} have distributions 



n 



7/+1) 



if f 
if 



ej+i 
0. 



e,-, 0<i<J, 



P{&i(7)=v} : 

L /i+i) 

Here represents the vector in M^ +2 whose jth element is unity and for 
which all other elements are zero. The occupancy after [_(5n\ balls have been 
thrown can then be represented by nY n {j3). 

As we have discussed, the large deviations behavior of T n {(5) will be de- 
duced from that of the process T n (-). In order to state the large deviation 
asymptotics precisely, we should clarify the space in which T n (-) takes val- 
ues and the topology used on that space. As usual for processes with jumps 
of this sort, we use the Skorokhod space D([0,P] :R /+2 ) together with the 
Skorokhod topology [5], Chapter 4. However, the large deviations properties 
of {T n } are the same as for the processes {T n }, where T n is defined to be 
the piecewise linear process which agrees with T n at times of the form i/n. 
For readers unfamiliar with the Skorokhod space and associated topology, 
the identical large deviations results hold for T n , save that the space of 
continuous functions and the sup norm topology are used instead. 

To complete the statement of the LDP for {T n } we need some additional 
notation. A vector of (deterministic) occupancy rates 9(x) = (9q(x), . . . , 6i(x),9i- 
is a measurable mapping from [0,(3] to Si. Intuitively these rates repre- 
sent the rate at which balls flow into urns of a given occupancy level. 
Associated with each such vector of rates is the corresponding determin- 
istic occupancy function j(x) = (jo(x), . . . , 7/(2;), 77+ (x)), which is defined 
by the initial condition 7(0) and the differential equations 70(2;) = —9q{x), 
7j(a;) = 9j-i(x) — 9j(x) for j = 1, . . . , I, and ^^(x) = 9i{x). These equations 
reflect the idea that the fraction of urns containing i balls increases as balls 
enter (i — l)-occupied urns, and decreases as balls enter i-occupied urns. 
Defining the matrix 
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we can write j(x) = M9(x) for all x 6 [0, (3] . Conversely, given a differentiable 
occupancy function 7, the corresponding rates are uniquely determined 
by 7(3;) = M9(x) and the normalization J2i=o@i( x ) + @i+i x ) = 1- We will 
also be interested in the cumulative occupancy function ip with components 
ipi(x) = Y^j=Qli{ x )- Inspecting the cumulative sums of the rows of M shows 
that ipi = —9i for i = 0, . . . , I. As more balls are thrown, the fraction of urns 
containing i or fewer balls can only decrease, and the rate of decrease is the 
rate at which balls enter i-occupied urns. We will say that 7 is a valid occu- 
pancy function if 7 is absolutely continuous, if 7(2;) is a probability vector 
for all x € [0,/3], and if its associated 9(x) is a probability vector for almost 
all x 6 [0, /?]. Note that the functions ip, 7 and 9 are interchangeable, in the 
sense that each can be derived from any of the others [given 7(0) in the case 
of 9]. Thus, we say that 9 and ip are valid if the associated 7 is valid. The 
following lemma gives a direct characterization of validity in terms of ip. 

Lemma 2.1. A vector of I + 1 continuous functions ip, each of which 
maps [0,(3] to [0, 1], is a valid cumulative occupancy path if and only if: 

(a) ipi(x) > ipi-i(x), 

(b) ipiix) > ipi(y), 

( c ) T.k=^k{x)-ipk{y))<y-x, 

for each <i < I and < x < y < [3. 

Sketch of the proof. In the forward direction, the first condition 
plus the bound ipi(x) < 1 ensure that 7(2;) is a probability distribution. The 
second and third conditions together imply that ip is Lipschitz continuous 
with constant 1 and, hence, absolutely continuous. This implies the absolute 
continuity of 7. Since ip = —9, the third condition ensures that 9 is almost 
always a probability distribution. The reverse direction proceeds similarly. 
□ 

Given 7, 9 G Si, let D(9\\j) denote relative entropy of 9 with respect to 7. 
Thus, 

m 

^(0|l7) = E^ lo g(^/7,), 

i=0 

with the understanding that 9ilog(9i/ r ji) = whenever 9% = 0, and that 
#jlog(#j/7j) = 00 if 9i > and ji = 0. If j(x) is a valid occupancy function 
with corresponding occupancy rates 9(x), then we set 

(2.2) J( 7 )= / D{9{x)\\ 1 {x))dx. 

Jo 

In all other cases set J(7) = 00. 
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Theorem 2.2. Suppose that the sequence of initial conditions {r n (0), 
n = 1,2,...} is deterministic and that it converges to a € Si as n — > oo. 
Then the sequence of processes {T n , n = 1, 2, . . .} satisfies the LDP with rate 
function J. In other words, for any measurable set A of trajectories, we have 
the large deviation lower bound 

liminfilogP{r n > -inf{J( 7 ) : 7 G A°,j(0) = a} 

n — »oo 77 

and the large deviation upper bound 

lim sup - log P{T n € A} < - inf{ JM : 7 Gi, 7 (0) = a}, 
n 

and, moreover, for any compact set of initial conditions K and C < oo, 
the set 

{7 :J(rr)<c, 7 (0) eK} 

is compact. 



The proof of this result is provided in Section 3. However, a formal justi- 
fication for the form of the rate function is as follows. Let 5 > be a small 
time increment. Owing to the fact that T n varies slowly when compared 
to {&i( 7 )}, one expects 



1 [nS]-l 



where the symbol "~" inside the probability means that the indicated quan- 
tities are within a small fixed constant e > of each other. Suppose that we 
interpret 7 as a probability measure on {eo, ■ • ■ , e/+i}, and let {Yj} be in- 
dependent and identically distributed (i.i.d.) with distribution 7. Then the 
sequence of i.i.d. random vectors {^ (7)} can be realized by setting 



J ~ 3 le/+i 
that is, 

bj{i)=MYj. 

By Sanov's theorem, for any probability vector 9 £ Si, 

[n8\-l \ 

n5 



-1 J2 Y j ^e)^exp-n5D(6\\ 1 ). 



Thus, 

T n (<5) -T n (0) 



Me 



r n (0)=7^) «exp-n<5£>(&||7). 



LARGE DEVIATION FOR OCCUPANCY PROBLEMS 



9 



Approximating an arbitrary trajectory by a piecewise linear trajectory with 
nearly equal cost and using the Markov property, one expects 

P(r n « 7 |r n (0) = a) « exp -n [ D(6(t)\\-y(t)) dt, 

Jo 

where 9 and 7 are related by 7 = M9, 7(0) = a. Thus, the rate function on 
path space should be J(7). 

The zero cost trajectories are of course the law of large numbers limits, 
and can easily be computed. For example, if a = (1, 0, . . . ) (so all cells are 
initially empty) and i < I, then J (7) = implies that Ji(t) = e~ l t l In 
other words, 7(f) is the Poisson distribution with mean t, save that all mass 
corresponding to i > / is collected together into the state I + 1. Throughout 
this paper we will denote the Poisson distribution with mean t by V(t), 
where Vi{t) = e'H* 

We are primarily interested in the distribution of T n {(3). The contraction 
principle (e.g., [11], Theorem 1.3.2) implies the following variational repre- 
sentation for the rate function for {T n (f3),n = 1,2,...}. Let A(a, uj, (5) denote 
the set of valid occupancy paths 7 on [0, 0\ satisfying 7(0) = a and = to. 

COROLLARY 2.3. Suppose that the sequence of initial conditions {r n (0), 
n = 1, 2, . . .} is deterministic and that it converges to a as n — > 00. Then the 
sequence of random vectors \Y n {(5), n = 1,2,...} satisfies the LDP with the con- 
vex rate function 

J(oo)=mf{J( 7 ):jeA(a,Lu,(3)}. 

Remark 2.1. Using the explicit formula for J(uS) stated in Theorem 2.7 
and the convexity of relative entropy in its first argument, it follows that J is, 
in fact, strictly convex. 

Remark 2.2. It is sometimes useful to show that the large deviation 
lower bound holds for sets with no interior relative to the ambient space. 
Such bounds can often be proved for processes, such as ours, that take values 
in a discrete lattice. An example would be a set A C {7 G Sj : 77+ = 0}, for 
which the minimizing trajectories must be polynomial extremals (defined 
after Theorem 2.6). Although we do not need such results in the present 
paper, it is worth observing that lower bounds of this kind can be proved. 

2.2. Characterization of the terminal rate function and minimizing paths. 
The results of this section were obtained using calculus of variations tech- 
niques, the details of which are given in the Appendix. The presentation 
begins with the case in which all urns are initially empty because it appears 
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in many applications and because it is a building block for the general case. 
In each case, first we give a characterization of J~(u>) as a minimal relative 
entropy, which may be computed explicitly in this form. We then give an 
explicit functional form for the sample path minimizer. The function form 
constain parameters determined by the solutions to fixed point equations. 

Before stating these theorems, it is helpful to specify the domain over 
which J{u)) is finite. We define an endpoint constraint to be a triple (a, u, (3), 
where a, to G Sj are the initial and terminal occupancies, and where j3 > 
is the number of balls thrown per urn. We assume without loss that ao > 0, 
that is, that some fraction of urns are initially empty, and we denote the 
index set of positive initial occupancies by K, = {k : > 0}. Since we do not 
distinguish between urns having more than / balls, we can always suppose 
that no urns initially have more than 1+1 balls, and denote the last element 
of a by aj+i. The last element of to is denoted ct>j+, signifying that it collects 
all urns with occupancy greater than or equal to I +1. 

Definition 2.1. A endpoint constraint (a,u,/3) is feasible if the corre- 
sponding set of valid occupancy paths A(a,u,(3) is nonempty. 

Lemma 2.4. An endpoint constraint (a,iv,(3) is feasible if and only if 

i i 

(2.3) 5Z Q j — i = 0,...,I (monotonicity) 

j=0 j=0 

and 

I i+i 

(2.4) iu>i + (I + 1)uJi+ < ^ kctk + (3 ( conservation) . 

i=0 k=0 

This lemma is proved in the Appendix. Condition (2.3) relates to the fact 
that the ipi(x) must decrease monotonically and (2.4) to a conservation con- 
straint for the number of balls thrown. The right-hand side of the inequality 
equals the initial number of balls per urn, plus the additional balls per urn 
thrown up to time /3, while the left-hand side is a lower bound on the number 
of balls per urn at time j3. 

In the treatment of general initial conditions, the following further defi- 
nition will be useful. 

Definition 2.2. A set of feasible constraints (a,u>,(3) is irreducible if 
the monotonicity conditions (2.3) hold with strict inequality for all i < I. 
Otherwise, the constraints are termed reducible. 
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For a reducible set of constraints, let i be the first index such that 
J2k=o a k = YX^^k- This condition can only be met if no balls are ever 
thrown into z-occupied urns, and it follows that urns which initially con- 
tain i balls or less will always contain i balls or less. Thus, these urns may 
be treated in isolation from the urns containing more than i balls. Further- 
more, by subtracting i + 1 balls from each urn in this latter set, we obtain 
another occupancy problem in standard form, that is, with ao > 0. Contin- 
uing in this way, a given problem with constraints (a,uj,j3) may be divided 
into a finite number of isolated, irreducible subproblems. It is only neces- 
sary therefore to treat problems with irreducible constraints, and wherever 
necessary we suppose this to be the case. 

2.2.1. Empty initial conditions. An important special case of the initial 
constraint is when all urns are initially empty, that is, ao = 1 and on = for 
i > 0. We abuse notation and denote this case by a = 1. 

Define the set F(l, u, (3) to be the set of distributions tt on the nonnegative 
integers satisfying 7Tj = U{ for i = 0, . . . , I and the constraint 

oo 

m = (3- 

i=0 

In the empty case, the conditions for feasibility of (1, ui, f3) reduce to Ya=q + 
(I + < 0, from which it follows that the set F is nonempty if and only 

if (l,u>,(3) is feasible. As we now discuss, the infimum of the rate function J 
over paths A(l,u,/3) can be represented as an infimum of a relative entropy 
over distributions in F(l,u,/3). 

Theorem 2.5 (Terminal rate function, empty case). Given the initial 
condition a = 1, the rate function <J(uj) : Si — > ]R + defined in Corollary 2.3 
may be determined as 

J(oj)= min D(n\\V(f3)) 

7rG-F(l,w,/3) 

i/(l,u;,/3) is feasible, and is otherwise infinite. The minimizing argument 
tt* e F(l,u!,f3) is unique. 

The above expression is of interest in its own right. Moreover, the optimal 
solution tt* can be computed explicitly. Using Lagrange multipliers, one 
can show, in fact, that the solution takes the form tt* = CPi(p/3) for i > I 
for some constants C > and p > that we refer to as twist parameters. 
Here p is related to the Lagrange multiplier for the conservation constraint 
J2iZo iir* = (3, while C is a normalization constant ensuring that J2t^o = !• 
These two constraints may be solved to determine p and C. If we have the 
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strict equality J2i=o + + ^) UJ i+ = 0i there are just enough balls to meet 
the terminal constraints and we may replace I by I + 1 if necessary to ensure 
that oji + = 1 — J2i=o u i = 0- I* 1 ^ ms case > it turns out that F(l, u, j3) has only 
one element; we then have C = 0, and we may take p = 1. Otherwise, we 
define p to be the unique positive root of the equation 



(2.5) 



ILJi 



Because of the strict inequality in the conservation condition (2.4), the right- 
hand side of the last equation is strictly greater than 1 + 1. The left-hand side 
of the equation is the conditional mean E\Y \ Y > I] of a Poisson random 
variable Y with mean p/3. As a function of p, this conditional mean is a 
strictly monotonic and continuous map from (0, oo) to (J+ 1, oo) and, hence, 
the equation has exactly one positive root. C is given by 

( 26 ) c _ l-Ei=o w i _ P-Y,i=QiUi 



Evaluating the relative entropy of it* and V(f3), the rate function of T"(/3) 
can be expressed as 

J{u)=j^Ui log + ^1 - £ ^ (bg C + (1 - p)p) 

+ (^P-^iui^ log p. 



(2.7) 



The next theorem shows that the least cost path 7* satisfying J( r y*) = 
can also be expressed explicitly in terms of the twist parameters C 
and p. The least cost path is of interest for a number of reasons. First 
of all, the proof of Theorem 2.5 is obtained by evaluating (2.2), using the 
explicit form of 7*. In a similar way, the least cost paths may be used as a 
tool in other problems of interest, such as for determining the rate function 
of r n (/3) when (5 is random. Second, the least cost path provides insight 
into the expected behavior of occupancy experiments, conditioned on the 
occurrence of a rare event. Third, they allow empirical estimation of rare 
event probabilities by change-of-measure importance sampling. A key step 
in proving the minimality of the proposed least cost path is to show that 
the path satisfies the Euler-Lagrange equations (defined in the Appendix). 



Theorem 2.6 (Globally minimizing path, empty case). Suppose that 
(1,uj,P) are feasible constraints with empty initial conditions. The infimum of 



LARGE DEVIATION FOR OCCUPANCY PROBLEMS 



13 



J (7) over A(l,u!,P) is achieved on the occupancy path 7 G A(l,u,(3) defined 
by 

(2.8) 70 (x) = Ce-* + f> fc - C7> fc (p/3)) (l - 

fc=o \ py 

(2.9) 7 i(^) = f (-1)^^), l<i</, 

7/+ (x) = 1 -^7i(x), 
i=0 

where p > and C > are iwisi parameters associated with the constraints. 
In addition, 7 satisfies the Euler-Lagrange equations. 

Note that the entire path 7(2;) is completely determined by the empty 
component 70 {x ) and its derivatives. In particular, the components ji(x) are 
the terms in the Taylor expansion of 70 (x) about x, 70 (x + y) = 70 {x) + 2/7^ (as 
evaluated at time 0, that is, with y = —x. Note that 70 (as) is the sum of 
a polynomial and a single exponential term, so that the Taylor expansion 
always exists. When C = 0, there is no exponential term, and we say that 7 is 
a polynomial extremal. Otherwise, C > 0, we have an exponential extremal. 



2.2.2. General initial conditions. Occupancy problems with general ini- 
tial conditions may be thought of as a coupled set of problems with empty 
initial conditions. In particular, we consider the set of urns initially contain- 
ing k balls to form a class. The evolution of excess balls (beyond k) entering 
urns of this class may be denoted by occupancy functions of the form "fk,j(t) 
representing the fraction of balls initially having k urns which have k + j 
urns at time t. The fraction of urns containing i balls in the overall system is 
obtained by summing contributions from all subproblem components with 
k + j = i. 

As in the empty case, the rate function J~(u) can be expressed as the 
solution to a minimization problem. Let 6*00 denote the set of distributions 
on the nonnegative integers, and let denote the set of n-tuples of such 

distributions. Recall that /C is set of indices k such that at > 0. We will 

\ic\ 

denote an element of Sea by ir = {7T( ), • • • > ?!"(&)> • • • ,K(k)}i where for any 
k € K, a component distribution is denoted 717^) = {717^0, 71^1, . . . }, and it is 
understood that the corresponding -kha is omitted if = 0. Finally, let 

F(a,f3,u) be the set of ir € Sbo which satisfies the terminal constraints 
(2.10) LOi= ^2 a k Ti k ^ k for all < i < I, 

k<i,k£lC 
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along with the conservation constraint 

oo 

(2.H) J2 a kJ2^k,j=l3. 

fce/c j=o 

As in the empty case, it may be established that F is nonempty if and only 
if (a,u>,/3) is feasible. 

Theorem 2.7 (Terminal rate function, general case). The rate function 
J{u)) : Si — > IR + defined in Corollary 2.3 may be expressed 

J{u)= min y> fc £(7r (fc) ||P(/3)) 

whenever (a, f3, uS) are feasible, and is infinite otherwise. The minimizing 
argument ir* € F(a,LU,j3) is unique. 

As discussed above, we may suppose the constraints are irreducible, and 
then, as shown in the Appendix, Lagrange multipliers will always exist for 
this problem. When we have strict inequality in the conservation condi- 
tion (2.4) (the exponential case), the solution takes the form 

* _ ( c k v j ( P p)w k+j , keic,k+j<i, 
^ {CkVjipP), keic,k+j>i. 

In the case of equality in (2.4) (the polynomial case), the corresponding 
form is 

( D k Vj(P)W k+j , keJC,j + k<I, 
k >i 1 0, k + j > I. 

As for empty initial conditions, p may be associated with the conserva- 
tion condition. The Wi correspond to the terminal constraints Ui, and the 
C k ,D k are normalization constants. The constants C k , p, Wi and Di can 
all be computed numerically using Lagrangian methods for constrained op- 
timization (see, e.g., [4]). Given these constants, the optimizing trajectory 
j(t) may be constructed explicitly. This construction may be most simply 
expressed in terms of the minimizing trajectories in the empty case. For 
fce/C, denote the mean of ir^ by (3 k = J2'j^=oj' K kj^ an< ^ define the terminal 
condition to^ £ Sj- k by Lo k ,j = 7r| j, for j = 0, — k. Then the constraints 
(1, c^(fc), Pk) are feasible constraints, and Theorem 2.6 determines the associ- 
ated least cost paths, which we denote jr^) = {lk,j( x )}- The least cost paths 
for the subproblems combine to form the overall least cost path. 

Theorem 2.8 (Globally minimizing path, general case). For irreducible 

I Kl I 

feasible constraints (a,u>,/3), let ir* 6 Soo be the unique minimizing distri- 
bution in Theorem 2.7, and let the functions j k ,j '■ [0,/3fc] ~~ > [0, 1] be the min- 
imizing paths corresponding to the subproblems (1,lo^, f3 k ). The infimum 
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of J(j) over A(a,ui,(3) is achieved on the occupancy path 7 G A(a,uj,(3) 
defined by 

i 

7*0*0 = a klk,i-k(xf3k/P), i = 0,...,I, 

k=0 

I 

7/+(s) = l-^7i(». 

i=0 

In addition, 7 satisfies the Euler-Lagrange equations. 

3. Proof of Theorem 2.2. The purpose of this section is to prove the 
main large deviations result. We recall the processes and notation defined 
at the beginning of Section 2. 

For ( G K /+2 and 7 G Si, define 

ff(7,C) = log(^[ex P (C-6 i ( 7 ))]), 

where "•" denotes inner product. Since the support of 61(7) is bounded 
uniformly in i and 7, there exists a function h:M— >M such that -ff(7,£) < 
for all C and 7. Also, since the distribution of 64(7) is weakly continuous 
in 7, ^(7, £) is jointly continuous. It follows from [10], Theorem 4.1, that 
the sequence {T n , n = 1,2, . . .} satisfies a large deviation upper bound with 
a rate function J, which we now define. Let L be the Legendre-Fenchel 
transform of -£^(7, C) hi C : 

L(7,r?)= sup [C-V-H(jX)]. 

If 7(3;), < x < (3 is an absolutely continuous function that takes values 
in 5/, then 

rP _ 

^(7) = / L{- l (x), A i{x))dx. 
Jo 

If 7 is not absolutely continuous, then J(7) = 00. ([10] assumes that the 
vector fields 61(7) are defined for all 7 G M^ +2 . It is easy to check that we 
can extend the definition to this set with the bound -ff(7,£) < h(\(\) and 
the continuity of 11(^,0 preserved. However, if the process r n starts in Si, 
then it stays in Si, and so the exact form of the extension has no effect on 
the rate function.) 

We recall the definition of the matrix M given in (2.1). To complete the 
proof of the upper bound we must show that J = J, where J is defined as 
in (2.2). This will hold if we can show that -^(7, rj) is finite only when n = M9 
for a unique probability vector 0, and that in this case 

L{ 1 ,M0) = D{6\\ 1 ). 
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It is well known that £(7,7/) is finite if and only if r\ is in the convex 
hull of the support of ^(7) (see, e.g., [11], Lemma 6.2.3(d)). Therefore if 
L(7, 77) < 00, then 77 can be written as a convex combination of the form 

#0 (ei - eo) H h0/(e m - ej), 

where 0j > and J2j=o Qj < 1- Since the vectors {(e-,+1 — ej), j = 0, 1, . . . , 1} 
are linearly independent, these values are unique. Setting Oi+i = 1 — J2j=o @j > 
we have r/ = MO for a unique probability vector 0. Now assume that 77 takes 
this form. Then 



L( 7 ,M0) = sup [C-M0-log(E[exp(C-M7))D] 

CgR /+2 



sup 

CgR 7 + 2 



M T C-9 



log 



^ 7i exp(C i+ i-C^ 



+ 7/+1 



sup 



E(o + i 

J=0 



0) 



log 



^7 ie xp(C i+ i 



0) 



+ 71+1 



Given any values fio,. . . , , we can define Co> • • • j Cf+i recursively by Co = — Mi+i 
and Cj+i — Cj = Mi ~~ With these definitions, it is apparent that the last 
display is equal to 



sup 



J2 H e i 
Lj=o 



-0/+i) - log 



^7 j exp(/j j -/i/+i) 
.j=o 



+ 7/+1 



sup 



X>j0j 
lj=o 



M/+i(l - 0/+l) 



+1 



l°g^Z]7jexp/i j j 



sup 



+ 2 



7+1 



/7+1 N 

- log ^7jexp/i j 



V=o 



According to the Donsker-Varadhan variational formula for relative entropy 
(e.g., [11], Lemma 1.4.3(a)), the last display equals D(9\\~f), thereby com- 
pleting the proof of the upper bound. 

We turn now to the proof of the lower bound. In this proof we will as- 
sume that 70(0) > 0. Since 70 is always nonincreasing, 70(0) = implies that 
7o(x) = for all x and, therefore, under this condition the first component 
plays no significant role. A proof analogous to the one given below applies 
when 7o(0) = 0, where the role of 70 (0) here is played by the first positive 
component of 7(0). 

Let P]>(o) [resp. £p"(o)] denote probability (resp. expected value) given 
a deterministic initial occupancy r n (0). To prove the large deviation lower 
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bound, it suffices to show that given any e > and 5 > 0, there is r/ > such 
that for any initial occupancies satisfying |r n (0) — 7(0) | < rj, 



(3.1) liminf-logP r n (0 / sup \T n (x) - j(x)\ < S) > - JM 
n - > °° n \o<x<i3 J 



£. 



Of course this inequality is trivial if J (7) = 00, and so we assume that 
J(<y) < 00. 

As we have remarked, a source of difficulty is the singular behavior of 
the transition rates of the process when T n is near the boundary of Si. 
We first show that this can be avoided at all times save x = 0. To do this, 
we show that for any a > 0, there exist b > 0, K € N and an occupancy 
function y such that y(0) = 7(0), sup 0<x< a \y(x) — j(x)\ < a, yj(x) > bx 
for all j = 0,1, ... ,1, 1+ and < x < f5, and such that 

J(y)<J(j). 

Consider the zero cost trajectory defined by 

z{x) = Mz(x), z(0)=7(0). 
We have the following expression for Zj when j < I: 

j 



(3.2) Zj {x) 



X)7fe(0)^'- fc )/(i-fc)! 
Lfc=0 



It is easy to check from this explicit formula that Zj(x) > bx K for some 
b > 0, K = I, and all j = 0, 1, 1+ and < x < (3. For p G (0, 1), let 
y p = pz + (1 — p)j. Then y p is the occupancy function that corresponds to 
the rate pz + (1 — p)9. Using the joint convexity of relative entropy in both 
variables ([11], Lemma 1.4.3(b)) and the fact that D(2r||2;) =0, we have 

J( y P) = / D(pz(x) + (1 - p)9(x)\\pz(x) + (1 - p)7(x)) dx 





rP rP 
<p D(z(x)\\z(x))dx+ (1 -p) / D(9(x)\\-f(x))dx 
Jo Jo 

= (l-p)J( 7 ). 

All required properties are then obtained by letting y = y p for suitably small 

pe(o,i). 

It follows that in proving the lower bound, we can assume without loss of 
generality that for some fixed constants b > and Jf€N, > bx K for all 
x £ [0, (3\. We now return to the proof of the lower bound. Our first objective 
is to show that the process can be moved into a small neighborhood of 7(7") 
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(for r > small) with sufficiently high probability. Given r £ (0,/3],a > 0, 
and e > 0, define 



0, | 7 - 7 (t)|<<x/2, 
2e, else. 



For n large enough that |7(L nr J/"-) — 7( T )I < °"/2 (and independent of 
r € (0,/3]), we have the inequality 



P r n (0) (\T n ( [nr\ In) - 7 ( LnrJ /n) | < a) + , 



-2ne 



> 



P ril( 0)(|r™(Lnrj/n)- 7 (r)|< C T/2)+ 1 



-2ns 



>E rn{0) (exp-nh(T n ([nr\/n))). 

We next exploit a representation for exponential integrals that will give us 
an explicit lower bound on the last quantity. Consider a process T n (x) con- 
structed as follows. The process dynamics are of the same general structure 
as those of T n , save that bi(T n (i/n)) is replaced by a sequence 6": 



i + 1 



n 



r n {-)+-b?, 



i 

— i 

n 



f n (o) = r n (o). 



Furthermore, the distribution of bf is allowed to depend in any measurable 
way upon the set of values {T n (j /n),0 < j < i}. Let ^ 7 denote the distribu- 
tion of 61(7), and (without explicitly exhibiting all the dependencies) let jif 
denote the (random) distribution of bf, given {T n (j /n),0 < j < i}. We let 
-Er™(o) denote expectation on the space that supports these processes. It 
follows from [11], Theorem 4.3.1, that 



log £r n (o) I exp —nh \ T 



[nrj 



inf E 



r™(o) 



hi T r 



[nrj 



[nrj-l 



(i/n), 



j=l 



where the infimum is over all such processes f n (x). In order to obtain a lower 
bound, we now simply insert a particular choice for the random variables 6™. 
We can write 7(1") — 7(0) = Mvt for some probability vector v. Define a 
process bf as follows: 



5? 



0. 



if 
if 



7 

fc=0 



<i < 



fc=0 



rn 



1, for 0< j < I, 



<i< \jn\ -1. 



In other words, bf defines a deterministic, discrete time approximation to 
the continuous time occupancy rate process that uses eo — e± for an amount 
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of time vqt, e\ — e 2 for an amount of time v\T and so on. This continuous 
time process will move the occupancy process from 7(0) to 7(7") at time r. 
If ej — ej-i is used at the discrete time step i, then since p,f concentrates 
its mass on e 7 - — e,_i, the cost is 



(3.3) D{^y fn(i/n) ) =log 



The process T n (i/n) possesses important monotonicity and convergence 



properties. Since r™_ 1 ((i+ l)/n) 



r^xCi/n) 



-1/n for |_Ei=o«ifc"™J < i < 



n 



Lk=0 



as * T LX/fc=o' i; fc Tn J • ^ n addition, because the (j 
modified when i > |_El=o v k Tn \ > it follows that 



l)st component is never 




3 

VkTTl 

k=0 



as n — ► cx3 and r/ — > 0. Furthermore, as observed previously, (3.2) implies the 
existence of b > and K S N such that 7j (r) > 6t a for j = 0, . . . I. Thus, at 
any given time step i we have a strictly positive lower bound on the relevant 
component of f n , which in turn provides a strictly finite upper bound on the 
corresponding relative entropy cost. Indeed, it follows from (3.3) and 7j(t) > 
br K that for all sufficiently large n and small i] > 0, there are C\ % Ci < 00 
(and independent of r) such that whenever |r n (0) — 7(0) | < 77, for all i, 



D(fJ>i\\^r n (i/n)] 

In addition, as n — > 00 and 77 — 

f 



<Ci[- 
0. 

-\rn\ 
n 



logr K \ < -C7 2 logr. 



7(t). 



By the Lebesgue dominated convergence theorem, for all sufficiently large n 
and sufficiently small rj > 0, |r n (0) — 7(0) | < r/ implies 



1 



log Epn (0) ( exp — n/i ( T 



[nrj 



< — C2rlogr. 



We now choose r > so that — C^rlogr < e/2. Choosing r > smaller if 
need be, we can also guarantee that |r n (x) — j(x)\ < S for all x € [0,r] w.p.l 
if |r n (0) — 7(0)| < r\ and r\ > is sufficiently small. The following bound is 
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therefore valid for the given r > 0: for any a > and all sufficiently small 
t] > 0, |T n (0) - 7(0)| < r] implies 



lim inf — log Ppn (o) 

n— >oo n 



T n(l^\\ J\nr\ 



n \ n 



<(T, 



sup \T n (x)-j(x)\ <S) >~ 

x£[0,t 



2 



Note that the asymptotic lower bound on the normalized log of the proba- 
bility is independent of a > 0. To obtain the lower bound for all x £ [O,0\, 
we will use the Markov property and an existing lower bound for paths 
which avoid that boundary. This latter lower bound will hold uniformly in 
a neighborhood of the initial condition j(t). Since we do not know a priori 
how small this neighborhood must be, it is important that the lower bound 
in the last display should be independent of a > 0. 

Now choose C G (0, 5] such that j(x) is at least distance 2£ from the 
boundary of Si for all x € [t,0\. Recall that when considered as a func- 
tion of 7, the distribution of 6^(7) is continuous in the weak topology, and 
moreover that the support of this distribution is independent of 7 so long 
as 7i > for all i E {0, 1, ...,/ + 1} (i.e., 7 6 Sj, where Sj denotes interior 
relative to the smallest affine space that contains Si). It then follows from 
Proposition 6.6.1 of [11] (see also the discussion on [11], page 165, regarding 
uniformity) that {r n , n = 1, 2, . . .} satisfies the following uniform large devi- 
ations lower bound: given any e > and Q > defined above, there is a > 
such that as long as |r™(LnrJ/n) — -y([riT\/n)\ < a, 

hminf-logP r » ( |„ r j/ n) \ nT \/n( SU P \F n (x) - y(x)\ < ( J > -J(j) - % 

where Pr™ ( [nrj In), I nrl In denotes probability given the occupancy levels T n ( [nrj / n) 
at time \ nr\jn. Proposition 6.6.1 of [11] assumes Condition 6.3.2. It is worth 
noting that in the present setting this condition holds with the particularly 
simple choice (3 = 7 (using the notation of [11]). 

The lower bound (3.1) now follows by the Markov property and the last 
two displays. The proof that J has compact level sets is as in [10], and 
therefore omitted. 



4. Examples and extensions. In this section we apply the results of the 
previous sections to three different occupancy problems. We show how the 
parameters of interest may be computed numerically and plot the solutions 
to the associated calculus of variations problems. In Section 4.4 we list some 
other asymptotic problems of interest which can be solved by relatively 
straightforward generalizations of the results presented in this paper. 
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In the calculus of variations problems solved in Section 2, precise initial 
and terminal points were always given. In typical applications, one is inter- 
ested in the minimum value of the rate function over a constraint set. When 
the constraint set is sufficiently simple, such problems may still be solved 
easily using the tools provided in Section 2.2. The three problems of this 
section are of this type. 

Suppose that the event of interest is that the random endpoint T n ((3) 
should lie in a terminal constraint set f2. To apply the LDP for {T n (f3)} 
established by Corollary 2.3, one must compute exponents of the form 

J{p)= inf J{oj). 



Using Theorem 2.7, we can write 



K 



J (SI) = inf inf £ a k D(-K (k) \\V((3)) 

K — U 

(4.1) 

= ^ f ^,£ QtD( '*> lim) - 

where we have abused notation to define F(a,Q,(3) = LLgq F(a,uj, j3). 

In many cases, for example, when the terminal set fi is convex and defined 
by linear constraints, the exponent J can be computed directly from (4.1) 
using Lagrange multipliers. That is, one solves minimization problems of the 
type given in Theorem 2.7, but with the endpoint constraints (2.10) replaced 
by constraints defining $7. This is the approach used in the second and 
third example below. We do not prove that appropriate Lagrange multipliers 
always exist; if needed, existence may be established using methods similar 
to those used in the Appendix for the case f2 = {uj}- Because of the convexity 
of relative entropy in (4.1), a local minimum is always a global minimum over 
convex sets. Hence, in any particular scenario with convex 0, it is sufficient 
to establish a local minimum by numerically computing a set of Lagrange 
multipliers. 

An alternative approach for computing J(£l) could be to return to the 
sample path level and use natural boundary conditions on the extremal 
curves (see [25]). 

A set Q, with interior £1° and closure is a ^7-continuity set if and only 
if inf^gfjo J(uj) = inf^g^ J(w). For such sets the large deviations lower and 
upper bounds coincide so that — liml/nlogP(r n (/3) G ft) = inf^gQ J7"(u;). It 
may readily be verified in each of the following examples that the event of 
interest is a ^-continuity set. 
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4.1. The classical occupancy problem. In the classical occupancy prob- 
lem, the urns are initially empty and one only distinguishes between empty 
and occupied urns, or in other words, 1 = 0. The associated large deviations 
problem was solved using a sample path approach in [28]; we show here 
how this case may be obtained with our results. We might be interested in 
the probability of having an unusually large number of empty urns Tq {(3) > 
u>q > e - ^ 3 , or an unusually small number of empty urns Fq((3) < u>o < e _/3 . 
In either case, the calculus of variation problem is that in which jo(/3) = ujq. 
From (2.6), it is immediate that C = 1/p and 

70(a) = -e-r* + 1 - ~ 
P P 

Using (2.5), we find that p is determined by the unique nonnegative solution 
to 

p(l- Wo ) = l-e-^. 

Finally, (2.7) provides a simple expression for J{u>) in terms of ujq, (3, C 
and p. 




Zero-cost paths ~" ^ - _ _ 

Constrained paths ~ * 

v ' ^ 1 1 1 1 

0.5 1 1.5 2 2.5 3 

Balls per urn, x 



Fig. 1. Cumulative urn occupancies tpo through ip§ for the classical occupancy problem, 
including unconstrained paths (dashed curves) and constrained paths (solid curves). 
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In addition, our analysis gives the sample paths for the higher occupancies, 
conditioned on an unusual number of empty urns, namely, ji(x) = Vi(px)/p 
for i > 0. Figure 1 depicts the first five occupancy levels as a function of 
balls per urn thrown x. In this example, the terminal fraction of empty urns 
ujq = 0.15 is three times larger than the expected value Vo(3.0) = 0.05. 

4.2. The overflow problem. In the overflow problem an urn is considered 
full when it reaches a finite capacity / > 0. Once an urn has been filled, suc- 
cessive balls thrown to that urn fall to the floor. For specificity, we consider 
the problem of determining the probability that an unusually large number 
of balls end up on the floor, and assume the empty initial conditions ao = 1. 

The problem can be handled using urns of infinite capacity in the following 
way. The number of balls that would have fallen on the floor in a finite 
capacity system is the number of balls in urns with occupancy greater than /, 
minus I times the number of such urns. When r balls have been thrown, the 
random number of overflowing balls w(r) is thus 

w (r) = r — nT\(r/n) — 2nT2{r jn) — ■ ■ ■ — InTj{r/n) — InTj + (r/n) 

or, since T I+ = 1 - J2i=o 

I 

w(r) = r — nl + n^(I — i)Ti(r/n). 

8=0 

In order to compute 

j ( 7? , / 3) = -li m Ilogpf^M> 7? Y 
n n \ n J 

we therefore consider sample paths which satisfy the end constraint 

I 

(4.2) J2(I-iH(P)>V + I-P = C 

Note that ( can be interpreted as the average spare capacity per urn, which 
must satisfy the bounds [I — j3} + < £ < /, and that the average overflow 
satisfies [/3 — I] + < i] < (3. Assuming that r/ (and Q is larger than would 
be expected in the zero cost case, the minimum large deviations exponent 
will be achieved with equality in the constraint (4.2). Assuming that ry > 0, 
we are in the exponential case, and the large deviations exponent Jo(j],f3) 
will be given by minimizing the divergence between ir and V(f3), under the 
linear equality constraints 



oo oo I 

(4.3) ]TVi = l, ^2iTT i = f3 and ^(1 - = £. 

i=0 i=0 i=0 
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We introduce the Lagrangian 

£(vr; y, Z, A) = n log + y ( 1 - E^J 

with Lagrange multipliers y, 2 and A. On differentiating, it follows that the 
minimizing distribution ir* should satisfy the conditions 

logvr* = log^iGS) -l + y + iz + (I- i) + X. 

In terms of variables C,p and v, we may write 

TT? = CVi(p0) fOTi>I 

and 

tt* = CV i {p(3)u I ~ i forz<J. 

The distribution is conditionally Poisson pf3/v for z < I and conditionally 
Poisson p/3 for i> I. For convenience, we introduce the notation 

00 -I 00 

Qi(p) = T l W) = -= £ WW), 

i=J ^ i=I+l 



_L 



V 



Since tt* must satisfy the three linear constraints (4.3), the constants C, 
p and v must solve the equations 

C(R I (p,v)+Q I (p)) = l, 
C^ERj^ + pPQj^ = /?, 

There can be at most one positive triple (C,p,v) satisfying these equations, 
since each such triple identifies a local minimum of D(tt\\V((3)) for tt in a 
convex set, and there can be only one such minimum. The equations can be 
solved numerically in a number of ways to obtain C, p and v. For example, 
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from the first constraint, C can be expressed C = (Rj + Qj) . Substituting 
this expression into the second and third constraints, we obtain the equations 

(p/v-l)R I (p,v) + (p-l)Q I (p) = 0, 

(j _ p p/ v _ c)jRj ( P) v) _ C q j(p ) + iv^pH) = o. 

Each equation implicitly defines a curve v as a function of /?, and the in- 
tersection of the two curves gives the desired (p,v)- We note that larger 
than expected values of £ will lead to v > p > 1, while smaller than expected 
values of £ give v < p < 1 . 

The large deviations exponent for the overflow problem may then be ex- 
pressed 

Jo(v,P) = D(-k*\\V(P)) 

oo 
i=0 

= log C + (3(1 - p) + log p + C log v. 

4.3. Partial coupon collection, with initial conditions. In the coupon col- 
lector's problem, the urns represent the n types of coupons that are required 
to form a complete collection. The placement of a ball in a given urn corre- 
sponds to choosing a new coupon at random, and the problem is to see how 
many coupons must be collected before 1 + 1 complete sets are obtained. 
This event corresponds to the constraint Ui = 0, % < /. 

In this section we solve a generalization of this problem. Beginning from 
nonempty initial conditions (a collection already in progress), we collect (3n 
additional coupons with the goal of obtaining more than / coupons of as 
many types as possible. We want to determine how likely it is that number 
of types for which we have collected / or fewer coupons is less than £n. 

In terms of the urn problems we have considered, we are given initial 
occupancies a and wish to compute 

J C (a,M = - lim - log P ( £ T?(J3) < A , 

where 

K I-k 
fc=0 i=0 

is an unusually small number of low occupancy urns. 

The exponent Jc will be given by computing J(uo) as defined in Theo- 
rem 2.7 subject to the conservation constraint (2.11), replacing the terminal 
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conditions (2.10) with the single constraint 

K i-k 

fe=o i=o 

where K < I since any sets which are initially complete may be left out of 
the problem. After constructing a Lagrangian and differentiating, we find 
that the minimizing solution must be of the form 

* _jC k V j ( P P)W, j + k<I, 
k ' j ~ ^-{CkVjipP), j + k>I. 

As in the previous example, the unknown constants may be determined by 
substituting the given form of the solution into the constraint equations, 
and solving the resulting system of equations. In terms of these constants, 
the large deviations exponent may be expressed 

K 

J c (a, 0, = /3(1 - p + logp) + £ log W + «fc log Cfc- 

fc=0 




Balls per urn, x 

Fig. 2. Cumulative urn occupancies Tpo through ips, for a partial coupon collectors prob- 
lem, including unconstrained paths (dashed curves) and constrained paths (solid curves). 
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Figure 2 depicts several cumulative occupancy curves ipi for a particular 
example. Suppose that there are n=100 types of coupons to collect, and 
the goal is to collect at least four coupons of as many types as possible 
(i.e., 1 = 3). Initially, one is given a single coupon of 30 types and pairs 
of coupons of a further 20 types, corresponding to the initial conditions 
a = [0.5 0.3 0.2]. In the zero-cost solution (dashed lines), ^3(2.0) ~ 0.71. 
Hence, after collecting 200 additional coupons at random {(3 = 2), one would 
expect to have 4 or more coupons for only about 29 types. To compute the 
likelihood that we have at least 4 coupons for more than 45 types, we take 
£ = 0.55, which gives a large deviations exponent Jq ~ 0.18 and a probability 
of about 10 -8 . The corresponding constrained occupancy curves are depicted 
by solid lines in Figure 2. 

4.4. Extensions. There are a number of variations of the basic occupancy 
problem which can be solved by fairly straightforward generalizations of the 
results of this paper, but which we will only mention briefly. These include 
the following problems: 

(i) a random number of balls are thrown, 

(ii) balls have a probability p of not entering any urn, 

(iii) balls enter different subsets of urns with differing probabilities, 

(iv) an event of interest may occur at any time in the interval (0,/3], 
rather than just at time (3. 

Some comments are in order. A particular example of (i) appeared in [13], 
where the number of balls thrown, r, was binomially distributed with pa- 
rameters < a < 1 and n. An urn model proposed by [16] is of type (ii). 
Here, the goal is to determine the distribution of the number of targets hit 
when r shots are fired at n targets, and when the probability of missing 
the target is p. In problem (iii) there are K > 1 urn classes with a frac- 
tion ckj., k = 1, 2, . . . , K of urns in each class. Urns enter class k with a fixed 
probability p^ but then enter any urn within that class with uniform proba- 
bility. Similar analysis to that for nonempty initial conditions can be applied 
to this problem. For an example of type (iv), suppose that an infinite se- 
quence of balls is thrown into n initially empty urns, and that one would 
like to know the probability that the number of urns containing exactly one 
ball ever exceeds n/2. The probability of this occurring after j3n balls have 
been thrown can be bounded above by the probability that the number of 
empty and singly occupied urns exceeds n/2 when exactly (3n balls have 
been thrown. This computation fits into the framework of a partial coupon 
collectors problem, and the probability can be made negligible from the 
point of view of large deviations by taking (3 sufficiently large. The remain- 
ing possibility, that the event occurs before (3n balls have been thrown, is 
then a problem of type (iv). The associated calculus of variations problem 
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is to find the lowest cost occupancy curve on (0,/3] among all curves with 
7i(x) > 0.5 for some x G (0,/?]. 



Analysis of the calculus of variations problem. The Appendix is dedi- 
cated to proving the calculus of variations results given in Section 2.2. Re- 
call that these results provide explicit representations for the terminal rate 
function J~(u>) defined in Corollary 2.3, and for the minimizing occupancy 
functions 7* satisfying Jiuj) = J(7*). 

In the first step of the proof, we characterize a set of extremal occupancy 
paths, that is, paths which satisfy the Euler-Lagrange equations. For all 
feasible terminal conditions of the form (l,u>,(3), Theorem A.l shows that 
occupancy paths of the form given in Theorem 2.6 are extremals and that 
paths of this form can be constructed to meet any feasible constraints of the 
form (l,a;, {$). Likewise, Lemma A. 6 and Theorem A. 11 show that occupancy 
paths of the form given in Theorem 2.8 are extremals, and such paths can be 
constructed to meet all general feasible conditions (a,o;,/3). The special form 
of the extremals is used in Theorem A. 5 and Theorem A. 13 to show that 
the extremals have the costs given in Theorem 2.5 and Theorem 2.7, respec- 
tively. If 7 is the extremal occupancy path constructed for given constraints 
(a,w,f3), we thus have the upper bound < J(j)- The assertion in The- 
orem 2.5 and Theorem 2.7 that the minimum relative entropy is achieved by 
a unique distribution tt* is established in Lemma A. 6. The final step needed 
to prove Theorems 2.5-2.8 is the lower bound J(uj) > J{j)- This bound is 
proved in Theorem A. 14, using the Euler-Lagrange equations together with 
properties of the relative entropy. 

A.l. Preliminaries. 

A. 2. Proof of Lemma 2.4. Recall that the lemma states that endpoint 
constraints (a,u,/3) are feasible if and only if 



APPENDIX 



(A.4) 



i = 



. . . ,1 (monotonicity) 



3=0 j=o 



and 



(A.5) 




Proof of Lemma 2.4. If a valid occupancy curve 7: [0,(3] — > Si meets 
the initial and terminal constraints a and lo, then property (b) of Lemma 2.1 
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implies (A. 4) and property (c) implies (A. 5) since 
;=n V ;=n i'=n / i=n 

(A.6) 

= (1 + l)(wj+ - aj+i) + 5^ - dj). 

3=0 

On the other hand, given the constraints (A. 4) and (A. 5), one can show that 
the linear functions 

x 

ji(x) = a-i + (ui - on)—, i = 0,...,I, 
I 

71+0*0 = 1 - ^7i(x) = ai+i + {uji+ - a I+1 )- 
i=o P 

satisfy the constraints and the conditions of Lemma 2.1. Properties (a) and (b) 
are immediate, and property (c) will be established by showing that — Y^l=o — 1- 
Indeed, —{3^,1=0^ ^ s equal to the left-hand side of (A.6) and, therefore, 
- Ei=o V>i < 1 follows from (A.5). □ 

A. 3. Euler-Lagrange equations. Given the numerous descriptions of oc- 
cupancy processes and rates (-0, 7,0,7, etc.), it is convenient to abuse 
notation. Thus, for example, we will write both J (7) and J(ip), with the un- 
derstanding that the fundamental object of interest is the occupancy func- 
tion 7, and that J(ip) is merely J (7) when ip is the cumulative occupancy 
process that corresponds to 7. Also, since tpi = —Oi for i = 0, . . . , I, we can 
define the local rate function (which is usually written as a function of tjj and 
ip) as a function of ip and 6, and represent the overall cost of a cumulative 
occupancy trajectory tp as an integral of the form 

J(if>) = / L(iP(x),8(x))dx. 
Jo 

Because the balls are thrown uniformly and randomly into the urns, the 
expected rate for balls to enter urns of occupancy i is 7^ = ipi — tpi-i. As 
discussed in Section 2, the cost of a deviation of a given path tp from its 
expected behavior at a given instant is given by the rate function 

L(ip,e) =£>(%) 

= ^>log^ + 0/ + log^± 

S 7* 11+ 

= I> fog + (i-X>) (1 " Ef 7° g0 • 
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The rate function is defined to be infinity if the curve is not a cumulative 
occupancy function. 

The calculus of variations problem is to find the path ip having least cost 
among all paths satisfying given initial conditions and endpoint constraints, 
and to find the cost of such a minimal path. As illustrated in the examples 
in Section 4, the results extend to cases where the terminal point (or the 
initial point) are required to lie in a given constraint set. 

Definition A.l. An occupancy path defined on [0,f3] is said to be an 
extremal if it satisfies the Euler-Lagrange equations [8], 

^ Mx)Ml)) = -±{^ (l)Mx)) 

for all ie{0,...,I},xe(0,/3). 

Although the Euler-Lagrange equations are neither necessary nor suf- 
ficient conditions for minimality in general, extremals do turn out to be 
minimal in many cases. In the following sections we will construct a fam- 
ily of extremal paths for the cost function given above, and show that the 
extremal paths are, in fact, globally minimal. 

In the case at hand, the Euler-Lagrange equations are given by 

(A.7) ; ■ h- - = — ^-log- - hlog 



ipi - ipi-i tpi+i -tpi dx{ ipi- tpi-i l-ipi 
for i = 0, ... ,1 — 1, and by 

(A.8) ■ h- — = — <^-log- ■ hlog- 



tpi — V^r-i 1 — "0/ dx [ ifii — tpi-i 1 — ipi 

In the case when we have equality in the conservation constraint (2.4), and 
by taking / + 1 to be / if necessary, we have that J2i=o k>i = 1- Such cases of 
equality are referred to as the polynomial case. When this holds, every valid 
occupancy path must have tpi(x) = 1 and, therefore, 8j(x) = 0i+(x) = 0. For 
all such occupancy paths, the rate function L(ip,6) then reduces to 

(A.9) L(i/>,6) = Y>i log 



The Euler-Lagrange equations pertinent to the problem of minimizing this 
restricted set of occupancy paths are just 

A.10 l — + - 1+1 =— -log- % — 

ipi - ipi-i tpi+i - ipi dx I ipi- ipi. 

for i = 0,...,/- 1. 
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A. 4. Characterization of the extremals under empty initial conditions. 

In this section we consider the simplest and most important case, in which 
the urns are all initially empty, ao = 1. Recall that to each feasible end- 
point constraint, (1, u, (5) correspond to twist parameters C > 0, p > which 
satisfy the equations 

/ oo 

$>i + CVi{pl3) = l, 

i=0 i=I+l 
I oo 

£>i+ ]T iCVi{ P p)=p. 

i=0 i=I+l 



Theorem A.l. Suppose that (l,u,f3) are feasible terminal constraints, 
and that p and C are the corresponding twist parameters. Then the set of 
functions 7 defined by 

(A.8) M*) = Ce-P* + £(u, fe - CV k (pf3)) (l - 

k=o ^ 

(A.9) 7j(2; ) = ^(_i)^W( x ), 0<i<I, 

1 

li+(x) = l-^7i(x), 

i=0 

are extremals on [0,(3] which satisfy the terminal constraints along with the 
initial constraint i/jo(0) = 1. 

Recall that in the special case C = 0, the first component of the extremal 
is simply the ith order polynomial 

(A.10) to(*)=:c«*(i-£* 

We refer to such paths as polynomial extremals, and to extremals with C > 
as exponential extremals. 

The following definition and lemma are useful in the proof of the above theorem. 



Definition A. 2. A nonnegative function tp is completely monotone on 
an interval [a, b] if it is infinitely differentiable on [a, b] with 

(A.ll) (-1)V W (^) >0 f or all x € [a, b] and % > 0. 
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This definition is based on the one pertaining to Bernstein's theorem, which 
characterizes Laplace transforms, see [15]. However, our definition differs in 
that it considers only a finite interval [a, b}. 

Lemma A. 2. The function ipo(x) given in (A. 8) is completely mono- 
tone on [0,/3]. Moreover, the inequality in (A. 11) is strict for x G [0, 0) and 
i = 0, . . . , I . In the case C > 0, this can be strengthened to all i = 0,1, ... . 

Proof. Clearly, ipo(x) is infinitely differentiable. Now suppose first that 
C > 0. The zth derivative of ipo is 



(-iy#(*) = fee-* + J> fc - CV k (pP))r % jj^y (i - -p) 

for i = 0, . . . ,1, and 

(A.12) {-l) i 4\x)=p i Ce-e* 
for i > I. 

It is clear that a{x) = (—l) I+l ipQ +1 \x) is completely monotone on [0, oo) 

Moreover, (— l)*aW(x) > 0, i£ [0, oo), i = 0, 1, We deduce that ip(x) = 

(— lYipQ (x) must be monotonically strictly decreasing, and that 



n 



m >o, 



so that ip(x) > on [0,/3). It follows that (p(x) is completely monotone 
on [0,/3] and that the derivative constraint is strict on [0,/3). Proceeding 
inductively to / — 1 and beyond, we arrive at the lemma. 

In the polynomial case C = the argument proceeds similarly on not- 
ing that 

(-l) 7 4 /) (x) = ^) Ul >Q for allxG[0,/3]. n 

Proof of Theorem A.l. The terminal constraints can be verified im- 
mediately by inspection. The initial constraints follow from the construction 
of C in (2.6), since 

v>o(o) =c(i -j2-Pi(pP)) + E w « = L 
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A similar computation also using (2.6) shows that ip^ (0) = — 1, a fact that 
we will need shortly. 

To establish that the given functions are valid occupancy curves, it is 
useful to introduce the infinite sequence of functions 7, and corresponding ipi 
obtained by extending (A. 9) to all i: 

(A.13) ll (x) = ^(-l)^\x), i = 0,l,... ,7,7+1,.... 

As the sum of an exponential and a polynomial, has a Taylor series 
representation of unlimited radius about any point x. Then V'o(O) — ipo(x) = 
£-,^1^0%), and thus, 

E^) = E^#(*) = <Mo) = i. 

j=0 i=0 

Since, by Lemma A. 2, the 7$ are nonnegative on [0,(3], we have {7i(x)} € 
for all x in that interval. It follows from (A.13) that for all i > 0, the rate of 
decrease of each cumulative occupancy is 

6i(x) = -ipi(x) 

i 

= -E^( x ) 



(A.14) 



fc=0 



|(-i) m 4 m) w>o 

brming tl 
follows from (A.14) that 



for x G [0,0]. Forming the Taylor series representation of ipQ about 2, it 



J2e l (x) = J2-Mx) = -^ 1 \o) = i. 



i=0 i=0 

The infinite sequence of functions can thus be thought of as a valid infinite- 
dimensional occupancy path on [0,(3]. Conditions (a) and (b) of Lemma 2.1 

(i) 

are immediate from the expressions for 7, and ipi in terms of i(>q , and 
condition (c) follows by integrating the inequality 

XX*) <i 

i=0 

over an arbitrary subinterval of [0,0]. Thus, the finite-dimensional occu- 
pancy path 7 is valid. 
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Finally, we must show that the given curves solve the Euler-Lagrange 
differential equations in (0,/3). We begin with the exponential case C > 0. 
The 1+ terms satisfy the simple expressions 

J oo oo 

8 I+ (x) = l + J2Mx)= E -i>i{*)=pC £ Vi(px), 

i=0 i=I+l i=I+l 

oo oo 

i-Mz)= E ^) = c E 

i=I+l i=I+l 

where the first display uses 9 = —ip and equations (A. 12) and (A. 14), and 
the second uses (A. 12) and (A. 13). Then the ratio — i/Ji(x)) = p 

is constant, and the corresponding terms drop out of the right-hand side of 
(A.7) and (A.8). 

From the expressions for ji(x) and ipi(x) given in (A. 13) and (A. 14), it 
follows that 

Mi.jft, x e M ,i>o. 

li{ x ) % (x) 

Then 



verifying (A.7). Likewise, to verify (A.8), we apply (A.16) with i = I, using 
the substitution —ipQ +2 ^/ipQ I+1 ^ = p obtained from (A. 12). 

In the polynomial case C = 0, note that (A. 15) holds for i = 0, . . . , I , so 
that (A.16) implies (A.10). □ 

A. 5. Interpretation of the twist parameter p. For an exponential ex- 
tremal the twist parameter p may be interpreted as follows. The expected 
rate for balls to enter urns with more than / balls is equal to the proportion 
of such urns, namely, l — tpj. The twist parameter p is then a multiplicative 
factor applied to 1 — tfti to give the actual rate at which balls enter these 
urns. Thus, if p > 1, balls unusually pile into high-occupancy urns, while 
if p < 1, they instead concentrate on the low-occupancy urns. The occu- 
pancy distribution of the high-occupancy urns remains Poisson but with a 
modified parameter. 

The infinite sequence of occupancy functions introduced in the proof just 
given is a useful construct. Operations which are technically difficult in 
infinite dimensions may nevertheless be carried out formally on the infi- 
nite sequence of functions, giving insight in to the solution for the finite- 
dimensional system. 
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The next two lemmas compute the cost of extremal curves in a general 
form which will also apply to the nonempty case of the next section. The 
conditions of the lemmas are satisfied by the exponential and polynomial 
extremals of Theorem A.l, as can be readily verified. 



Lemma A. 3. Suppose that ipQ is completely monotone on [0,(3] with 
(-lj^f{x) = Cp i e- px fori>I, 

for some I > 0, C > 0, and p > 0. Further suppose that {70,71, ■ • •} are an 
infinite sequence of nonnegative functions on [0,(3] satisfying 

00 00 
Y^li{x) = l, ^2iji(x) < B , 

i=Q i=0 

00 



i=o % ( x ) 



for all x £ [0,(3] and some constant Bq < 00. Let 7 denote the vector of 
functions {70, . . . ,77,7/+}, where 774. = 1 — J2i=oli- Then the cost J(j) is 
given by 

00 

(A.17) J( 7 ) =/? + B7*(/?)l°g|tf ) (/?)l -7i(0)log|4°(0)|]. 

i=0 

PROOF. For alH > / and x £ [0, (3], we have tpQ /V>o =~P- Therefore, 
using 9 = —ip with (A. 12) and (A. 13), it follows that 

/ 00 00 

= 1 — XI — ^ = -$i = P H 7i = p(l-^/)- 

i=0 i=I+l i=I+l 

Then for each x 6 [0, (3] , the cost function L can be interpreted as an infinite- 
dimensional cost function L^. Indeed, since —ipi/'Ji = — Wi^VV'o^ = P f° r 
i>/and (l + Ef=o^)/(l-^)=P, 

L(^,^)=^(-^)log- ( " 



i=o ^ " ^-1 

+ (1 + tp H h $1) log - 

l-ipl 

f)Hk)log =Loo(l(>,1p). 
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Note that since L^ip^tp) can be interpreted as a relative entropy, we always 
have Loo^, ijj) > 0. The total cost J(^) may be computed by integrating L^. 

Note that | J2i = ^-> x £ [0, j9] and that given e > 0,log —i/)q +1 \x)/iPq(x) 

are uniformly bounded for x € [e, /3 — e\. By the monotone convergence the- 
orem 



J( 7 ) =lim Yl log \ f J cfa. 



Using our convention that ip-\{x) = 0, the integral on the right may be 
written as 

r/3-e 00 . , . r/3-e 00 . . 

/ ^ — ^(x) log|Vo (x)\dx+ / J2M x ) lo ^m ( x )\dx 

(A.18) 

= £ / (^(x)-^-i(x))log|#(x)|. 

i=0 Je 

The above expression is valid as long as the left and right series in the first 
line converge. But this follows from the finite mean condition on {"fi}(x) 
since, for i > I, 

-ipi (x) log I V>q ) (x) I = fr/i (x) [log C + i log p - px] 

and similarly for ipi(x) log (x)|. Applying integration by parts and 

using (A. 15) for each term of (A.18), the integral is 

$>iogiv# ) i]f i - £ + / 5>ifed*. 

The lemma follows on taking limits as e J 0. □ 

A similar result holds in the case of the polynomial extremal. 

Lemma A. 4. Suppose that ipo is a degree I polynomial which is com- 
pletely monotone on [0, 0\ . Let {70, . . . , 7/} be nonnegative functions on [0, 0\ 
satisfying 

1 1-1 
Yli(x) = l, 5^-<0i(a;) = l, 

i=0 i=0 

-4 +l \x) 



-^i{x)= li {x) r " J , i = {0,...,/-l} 
% } (x) 

for all x £ [0, (3] . Then the cost J{^) is given by 

(A.19) J( 7 ) =/? + E^(/3)log|#(/3)l -7i(0)log|4°(0)|]. 

i=0 
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Proof. The cost obtained by integrating the reduced cost function (A. 9) 
from to j3, using the same substitutions and integration by parts as in the 
proof of Theorem A. 3. □ 

A. 6. Characterization of the extremal cost. In the case of empty ini- 
tial conditions, recall that the 7i defined in the proof of Theorem A.l 
satisfy ^i{x) = x l \ijj^ (x)\/i\. Using this expression to substitute for tjjjp in 
(A. 17) and (A. 19), we find that the cost 7(7) is simply the relative entropy 
D{^{f5)\\P{j3)) between the 7«(/3) and the Poisson, zero cost distribution. 
It turns out that the given 7i(/3) minimize the relative entropy, among all 
distributions for which the first I + 1 elements are determined by uj, and 
which have mean (3. Denoting the set of all such distributions by F(l,u,/3), 
we may prove Theorem 2.5, which we restate here. 

Theorem A. 5. Suppose that 7 is an extremal occupancy path constructed 
according to Theorem A.l to meet feasible terminal constraints (l,ui,/3). 



Proof. We first solve the minimization problem, and then relate the 
problem to J(7). The result is trivially true for polynomial extremals, since 
then F(l,u>,(3) has only one element. In the case of an exponential extremal, 
the given minimization problem can be solved using Lagrange multipliers, 
which turn out to be simple functions of the twist parameters p and C > 
associated with the given endpoint constraints. We consider the set of 
nonnegative sequences {tt^J^q 6 R+ satisfying 7Tj = Ui for i = 0, . . . , I, and 
define the Lagrangian £ on this set to be 



where z = logp and y = logC + (1 — p)(3 + 1. 

Define ir* = where the latter are determined as in the proof of The- 

orem A.l with the given C, p. For any i> I, the definitions of ir*,x and y 
and the strict convexity of xlogx imply that n* is the unique global min- 
imizer of x — > xlogx — x[log"Pj(/3) + y + iz\. Therefore, for any i > I and 

7Tj G [0,00), 



Then 




in D(tt\\V(P)). 




TTi log 



yiTi - ZlTTi > tt* log 



- yir* - zm*. 
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Following standard Lagrangian arguments, we thus have 
inf D(tt\\V((3)) = inf C(n',y,z) 

> inf C(ir:y, z) 
= D(n*\\V((3)). 

Since ir* € F(l,u,(3), it follows that ir* , the terminal distribution, is the 
minimizer of the relative entropy. The uniqueness of ir* follows from the 
strict convexity of the relative entropy with respect to its first argument. 
Substituting the particular form of ir* into (A. 17), we have 

■%) = f>(/3) log = D(n*\\V({3)), 



where 



J Ui, < i < I, 

\CVi(p(3), i>L n 



The cost for an exponential extremal can be written explicitly in terms 
of p, ui and /3 as 

J( 7 ) = X> log + ( 1 - E ^) (log C + (1 " P)/ 3 ) 

where C is defined by (2.6). In the polynomial case, the cost is simply given 
by the first term in the above expression. 



A. 7. Characterization of the extremals under general initial conditions. 

We now generalize the solution to the Euler-Lagrange equations for the case 
when the urns are not all initially empty, but in which they may contain up to 
K balls. Recall that the fraction of urns having k balls initially is denoted a^, 
and the set of urn occupancies k with > is denoted /C. Without loss of 
generality, we may assume that some urns are initially empty (hence, G /C) 
and we may take K = max/C. 

In the case K > 1, we regard the urns as belonging to \JC\ classes, accord- 
ing to their initial occupancies. We denote the final number of additional 
balls per urn entering the kth class by (3k, in which case the total number of 
additional balls per urn is (3 = X)fce/C a kPk- It turns out in the solution of the 
Euler-Lagrange equations that the fraction of balls entering the /cth class in 
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any time period is akftk/ft- After rescaling time by the factor (3k/ P, the evo- 
lution of additional balls entering each class of urns becomes an occupancy 
problem with initially empty conditions and terminal time 

We can thus define |/C| occupancy curves jr^ = {7fc,i}i, where the function 
lk,i ■ [0,/3fc] ~~ *■ [0,1] represents the fraction of class k urns which contain i 
additional balls (thus, k + i total balls) after x balls per urn have been given 
to this class. The overall extremal occupancy curves for the general initial 
conditions are then given by 

ji(x) = Y a k -yk,i-k(x(3k/(3). 
fce/c 

We will see that the occupancy curve jr/A for the kth subproblem is an ex- 
tremal of the form given by Theorem A.l, with I — k terminal constraints. 
Given the appropriate subproblem terminal conditions (l,u>(k), flk), the re- 
sults of the previous section determine the twist parameters Cj~ and and 
corresponding extremals. 

At this point we come to the main obstacle in determining the extremals 
for nonempty initial conditions. This is to show that there exist subprob- 
lem terminal conditions @k) which yield the extremal solution in the 
overall problem. To show that such conditions and the corresponding ex- 
tremals exist, we will first give the form of the final cost function (which 
can be obtained by formal arguments). We then show that the cost function 
is the solution of a minimization problem, that the problem has a unique 
minimizing argument, that it has corresponding Lagrange multipliers, and 
that the extremal curves can be constructed using the Lagrange multipliers. 

The large deviations exponent for the case of nonempty initial conditions 
a turns out to be 

(A.20) mir iY^ot k D(ir w \\P(J3)), 
subject to the conservation constraint 

oo 

(A.21) £a*X>r fcj -=/? 
and the terminal conditions 

(A.22) Ui= Y a k TTk,i-k for all < i < I. 

k<i,k£K 

\]C\ 

Recall that Soo is the set of all |/C|-tuples of distributions {it(k)}k£K,i where 
each 7T(fc) is a distribution on the nonnegative integers. It is straightforward to 
show that this problem is feasible whenever (a,u,/3) are feasible constraints, 
as we discuss in the proof of Lemma A. 6. As in the empty case, each polyno- 
mial problem can be formulated in a standard way so that u>i+ = 0, in which 
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case equality holds in the monotonicity condition J2j=o a j = Sj=o w j = 1 as 
well as in the conservation condition (2.4). Unlike in the empty case, the 
minimization problem does not become trivial in the polynomial case since 
degrees of freedom remain in allocating balls among the |/C| classes. In the 
polynomial case, the constraints imply that 

(A.23) vr fcj =0, k + j>I 

so that the minimization problem is finite dimensional. The polynomial prob- 
lem can be stated equivalently as minimization of (A. 20) subject to (A.23) 
and the endpoint constraints (A. 22), in which case the conservation con- 
straint (A. 21) need not be included explicitly. 

As written, the vector of distributions tt = (7T( ), • • • , 7T(fc), ■■■> ft(K)) ls such 
that k only ranges over indices with > 0. Including all indices < k < K 
yields an equivalent problem in which the minimizing solution is not unique, 
since the ~kqa with = make no contribution to the objective function 
or the constraints. In the form given, however, the solution can be shown to 
be unique. 

Lemma A. 6. Suppose that (a,u,(3) are feasible constraints. Then there 

is a unique vector of distributions tt* E Six which minimize (A. 20) subject 
to constraints (A. 21) and (A. 22). 

Proof. Recall that F(a,u,f3) denotes the set of all distributions it € 

Sot' satisfying the constraints (A. 21) and (A. 22). The feasibility of (a,u;,/3) 
implies that F has at least one element. Moreover, it has at least one element 
with finite support and, thus, finite cost. A sketch argument for this point 
is as follows: First of all, any set of exponential constraints with terminal 
condition u € Sj can be reformulated as a set of polynomial constraints with 
terminal condition cu £ St, where I > I. Any feasible point for constraints 
(a, u), (3) would also be feasible for the original constraints, and would have 
finite support. It remains to show that there is at least one feasible point 
for each set of feasible polynomial constraints. Such a feasible point may be 
constructed by an ordered filling construction. First, we assign 710,0 = ^o/ao- 
The first monotonicity condition (i = 0) in (2.3) ensures that this is possible 
with less than or equal to unit mass. Next, some of the remaining mass 
from the distribution 7T(o) is applied to 710,1, and mass from 7Tm is applied to 
^1,0 as necessary until the constraint 00710,1 + ai7Ti.o = 0J\ is satisfied. That 
7T(o) and 7T(i) have sufficient mass to do so follows from the (i = 1) condition 
in (2.3). This process continues until all constraints uj{ have been satisfied. 
The equality in the conservation condition (2.4) implies that the final step 
uses up all of the probability mass in the distributions {tt^}. 

It will be useful to consider |/C| -tuples of distributions in Sqq under the 
topology of weak convergence, with the distance between distributions given 
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by the Prohorov metric [5]. For two distributions P, Q in Sqc, this distance 
is simply 

^ ' Pi Qi — ^ ' Qi Pit 

i:Pi>Qi i:Qi>Pi 

and the metric extends to |/C| -tuples by treating them as elements of a 
product space. For each A < oo, define the set 

H A =^e : a k D(7r {k) \\V((3)) < a| . 

The level sets of the relative entropy are compact under the above topology 
([11], Lemma 1.4.3), from which it follows that Ha is also compact. For later 
reference, we note that a finite sum of relative entropy functions also inherits 
from the relative entropy the properties of lower semi-continuity and of strict 
convexity with respect to its first argument. As the next step in proving the 
lemma, we wish to show for some finite A that 

Q A (a,uj,(3) = F(a,u;,(3)nH A 

is compact and nonempty. This will enable us to find the minimum as a 
limit of a sequence of distributions in Qa- 

Since there are solutions with finite cost, it is automatic that Qa(cx,uj, f3) 
is nonempty for large enough A. Since the set Qa is bounded, it is enough 
to verify for each convergent sequence of distributions tt^ £ Qa that the 
limiting distribution W lies in F(a,f3,u). (That it lies in Ha is immediate 
since this set is compact.) The only difficulty lies in showing that the mean 

(n) 

of the kth subdistribution tt^ is equal to the limit of the means of the tt^ . 
This will be the case if we can show for any sequence in Qa and for an 

(n) 

arbitrary k that the sequence tt)jJ is uniformly integrable. In the present 
context, uniform integrability means that for each e > 0, there is m < oo 
such that 

(A.24) £ jvrg < e 

j>m 

for all n and k. 

Since e y — 1 = sup x>0 [xy — xlogx + x — I], we have the inequality 

xy < (xlogx — x + 1) + (e y — 1), 

where by convention OlogO = 0. Observe that this inequality is valid for 
all y and all x > 0, and that x log x — x + 1 is nonnegative. We will also make 
use of the fact that the Poisson distribution has exponential moments: for 
any 5 > 0, 

3=0 3=0 J ' 



42 P. DUPUIS, C. NUZMAN AND P. WHITING 

Taking y = j/S and x = 7r k n j /Vj((3), we have the estimate 
. (n) 



j>o 



< E ^ - 45 + W) + * E - w-w 



j>m 



j>m 

j>m 

where the second equality uses the fact that both 7rfe and are probabil- 
ity distributions. Thus, (A. 24) follows by first picking 5 > sufficiently small 
and then m < oo sufficiently large. [Note also for use in Corollary A. 7 that 
the analogous result holds if j3 is replaced by a sequence (3^ — ► ft G (0, oo)]. 

We now apply the uniform integr ability to analyze the limits of the means. 
It follows from [5], Theorem 5.4, that 



Finally, 



V a k [3 k = lim V Q fc /3^ n) = lim/3 = f3 



fc k 



so that 7f G F(j3,a,u) and is compact. 

We have shown that Qa is compact under the Prohorov metric and that 
it is nonempty. Now let 

G= inf Y. a kD(7T {k) \\V((3))>0. 

neQA keic 

Choose a sequence tt^ G Qa such that Y,k=o ^kD(^t] \\V(fl)) <G + 1/n. 
Since Qa is compact this has a convergent subsequence, and to simplify 
the notation we index this subsequence by n. Let ir* G Qa be the limit 
point. Since relative entropy D(ir\\rj) is lower semi-continuous in ir ([11], 
Lemma 1.4.3(b)), it follows that 

^a fc Z?(7r^||P(/3))<hminf^« fe D(vrg||P(/?))=G, 

where in fact we have equality by definition of G. The uniqueness of this 
minimizing vector of distributions follows from the strict convexity of the 
objective function. □ 
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Before continuing with the question of the existence of the extremals, we 
pause to obtain a corollary which will be useful in establishing the strong 
minimum in Theorem A. 14. Define J~(a,u),f3) to be the minimum cost in 
the problem (A. 20) if the constraints are feasible, and oo otherwise. 

Corollary A. 7. Suppose that (aW,wW, (3^) is a sequence of feasible 
optimization problems with costs = J \oc- n \u)( n \ f3^) such that 

componentwise and such that (a,to,/3) is feasible with < (3 < oo. Then 

liminf >J(a,u,(3). 

n 

Proof. For any A > liminf^ = G, we may choose a subsequence of 
problems such that 

jW<G+l/n<A, 

and for each problem in this subsequence, we define vrW G Q A (a (n) ,/9 (n) ,w (n) ) 
to be the minimizing solution tt* given in the above lemma. By compactness, 
there is a further subsequence such that ir^ — > W € Ha- As in the proof of 
Lemma A. 6, if (3 < oo, then 

oo oo 

/t ] = £;41 - a = £i*w. k=o,i,...,K. 

3=0 3=0 

If a[ n) -» and (3 { k n) a k n) 0, then the term a^DfrW \\V{J3 {n) )) would go to 
infinity; to see this, note that the infimum of D(tt\\V(P)) over distributions 
7r with mean A is — A + Alog(A//3). Since our sequence has bounded cost, 
we must have ' f3^ — » a k [3 k even if = 0. 
Thus, 

£ (3 k a k = lim£/?f ^ = lim/?W = /?, 
fce/c k 

and TT satisfies the constraints (a, u,j3). By joint lower semi-continuity of 
the relative entropy in both arguments ([11], Lemma 1.4.3(b)) we have 

hmmff4"^(vrg||P(/?W))> liminf £ a^D(^ \\V(^)) 

k=0 k:a k >0 

> J2 *kD(W (k) \\V(P)) 

k:a k >0 
>J{a,LU,p) 
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as desired. □ 

We now turn to the existence of the Lagrange multipliers corresponding 
to the minimization problem (A. 20). For a given vector lo, it will be helpful 
to define the set of integers I(u>), where % G T{uj) if < i < I and uji > 0, or 
if i > / and u>i+ > 0. This is the set of terminal urn occupancy levels which 
the constraints do not force to be empty. As we will show, for irreducible 
constraints, the optimal {^kj} with k + j are always strictly posi- 

tive. This result does not hold directly for reducible constraints, but as we 
discussed earlier, any problem with reducible constraints can be replaced by 
a finite number of subproblems with irreducible constraints. 

Lemma A. 8. Suppose that (a,to,/3) are irreducible feasible constraints, 
and let it* G be the minimizer of (A. 20) subject to constraints (A. 21) and (A. 22). 
Then 7rj£ • > for all k + j G 1.{uS) and 7r£ • = if k + j I(u)) . 

Proof. For k + j £ T(oj), the constraints force the tt^j to be zero for all 
feasible points of the problem, and hence, for the minimizer in particular. 
The main point is to show that it is feasible for any other element to be 
positive, and the result will then follow from the infinite derivative of the 
objective function near the boundary. 

Let 7T G Soo be any feasible solution meeting the given constraints, and 
suppose that % m ,n = for some particular (m,n) with m + n G I(u). Let 
7T be an arbitrary set of probability distributions in Soo subject to the 
restrictions that % m ,n > 0; that tt^j = for all k + j £ T{ui), and that tt has 
finite support. If we define 

i oo 

k=o fce/c j=o 

then 7r is a feasible solution for the constraints (a,u),/3). Next, for any n > 0, 
we may define 

$±(P-TlP)/(l-Tj), U,= (u-TIU>)/(l-Ti). 

By construction, the constraints (a,u),ft) are feasible for sufficiently small 
j). In the exponential case, this is immediately true because (a,uj,j3) satisfies 
all constraints in (2.3) and (2.4) with strict inequality. In the polynomial 
case, the (a,u,(3) satisfies the Ith monotonicity constraint of (2.3) and the 
conservation constraint (2.4) with equality. However, it may easily be verified 
that these two constraints also hold with equality for (a,uj,$) and, hence, 
for (a,uj,f3). 
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We may now let tt be a feasible point for (a,u,0), and finally form 
7f = rjTi + (1 — 77)71". By construction, tt is a feasible solution corresponding 
to the original constraints (ct, tt?,/?), and, in addition, 7iVn.,n 

> 0. As discussed 

in the proof of Lemma A. 6, we may take fr to have finite support so that tt 
also has finite support. 

We will show that tt is not a minimizer by proving that a sufficiently 
small perturbation toward tt reduces the objective function. Because the 
constraints are linear, the points tt £ = sit + (1 — e)7r are feasible solutions to 
the original constraints for all < e < 1. Let /(e) denote the value of the 
objective function evaluated at 7r e . The derivative of this function is 



/'( e ) = a k ( log ^f^r + 1 ) (Tffcj - n kij ) 
fee/c i=o v A J'v 3 J 



7T £ 

^ a k log (Tffcj - 7r fe j) 



7T £ 7T £ 



Vn ^ keK. (k,j)^(m,n) ' 

keK j 1 
In the limit as e — ► 0, the third term in the last display tends to 

-J2» k D(7r {k) \\V(f3))<0. 

k 

The second term is bounded above by the expression — J2k a k J2j ^k,j ^S'PjiP)} 
which is finite because tt has finite support. The first term in the display 
tends to — 00, which establishes that f(6) < /(0) for some sufficiently small 
5 > 0. We have shown that, for any m + n£ 2~(o>), tt cannot minimize (A. 20) 
if 7Tm,n = 0. As TT m ,n is arbitrary within ra + n £ it follows that ir^ n > 

whenever m + n £ T{u) ■ O 

We now establish the existence of Lagrange multipliers and the form of 
the optimal solution for the polynomial case. 

Theorem A. 9. Suppose that (a,uj,(3) are irreducible feasible constraints 
yielding equality in (2.4), and let tt* be the corresponding unique minimizer 
from Lemma A. 6. Then there exist positive constants {D k } ke fc and {Wi}i £ x 
such that the minimizer takes the form 
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^ = f D k Vj (0)W k+j , ke)C,j + ke l(u) , 
\ 0, otherwise. 

Proof. The strict equality in the constraints requires every feasible 
point to be supported on the finite set satisfying k 6 fC and k + j € T 
and, hence, the minimization problem (A. 20) may be considered to be a 
finite-dimensional problem over this set. By Lemma A. 8 the minimizer ir* is 
strictly positive. Since the objective function is continuously differentiable 
in a neighborhood of ir* , and the constraints are linear, Lagrange multipli- 
ers are guaranteed to exist for this problem (see, e.g., [4], Proposition 1.33). 
Specifically, there are constants z k and Wi such that the Lagrangian 



£ W = E^ X n k,3 lo S ^T^Y + Z k a k ( 1 - J2 

keK k+jei 1 i\P) k eK \ k+i&i 



+ Y Wi \ UJi ~Y a k^k,i-k ) 

is also minimized at ir* , with all partial derivatives of C being zero at the 
optimal point. Taking partial derivatives and rearranging, the optimality 
condition yields 

The result follows on defining D k = e Zk ~ 1 and W{ = e w ' . □ 

We now give the corresponding theorem for the exponential case. 

Theorem A. 10. Suppose that (a,u,/3) are irreducible feasible constraints 
yielding strict inequality in (2.4), and let it* be the corresponding unique min- 
imizer from Lemma A. 6. Then there exist positive constants p, {C k } k€ /c, a nd 
{Wi}i^x,i<i such that the minimizer takes the form 

( CkVjipflWk+j, keic,k+j<i,k+j€i, 
*kj = \c k r j ( P p), k€K,k+j>i, 

[ 0, otherwise. 

Proof. To avoid difficulties with infinite-dimensional Lagrangians, con- 
sider a sequence of truncated problems indexed by a sequence of integers 
M > I. For each such M define 

fce/c 1<M 1<M 
and consider the problem of minimizing 

keK, j<M,k+jeX jKf ' 
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subject to the constraints 

E ^,i = P [M \ 

keK l<M,k+lex 

E ^Ki = ri { k M \ keK,, 

l<M,k+lel 

E a k^k,i-k = uii, iel. 

k<i,keK 

By construction, the minimizer of this problem is obtained simply by trun- 
cating 7r*. As in the previous lemma, the strict positivity of the minimizer, 
together with the linearity of the constraints, guarantees the existence of 
Lagrange multipliers via [4], Proposition 1.33. The Lagrangian for the Mth 
problem is 



£W = 5> £ vr fej log^ + ^)f^)-E« fc E ItmI 

keK. j<M,k+jeI jir) \ keK l<M,k+leT / 

keK \ l<M,k+leT / 

+ E W i ( L ° i ~ E a k^k,i-k J • 
iel V k<i,k£K / 

Since the derivative of £ with respect to 7Tfcj must be zero at the minimizer, 
it follows that 

for all j < M and + j £l, where for convenience we have defined W{ = 
for i> I. For k+ j > I, note that 



so that y( Af ) is independent of M. Since if^+j = if k + j > I , it follows that 

z!k ^ and 10$ ^ are also independent of M. Then the above expression for 
7r£ ■ holds for all k+j Gl, with fixed Lagrange multipliers y, {z k } and {wi}. 
The form expressed in the theorem is based on the substitutions p = e y , 

C k = e (p-i)/3-i+* fc and W . = e wi_ n 

Now that we have the general form of the minimizing endpoint values 7r£ • , 
we are ready to characterize the extremal curves. We denote the mean of 
the kth distribution n* k ^ by Pk = Yljj^k j - ^ n ^ ne exponential case, we define 
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Pk = pft/ fik so that the distribution n^) meets I — k terminal constraints, 
has mean (3k and a Poisson pk(3k tail. Denoting the terminal constraints by 
u>k,j = 7T/5 j and = (ujfcfi, ■ ■ ■ ,^k,l-k), we see that ir^ is the minimizing 
distribution arising in Theorem A.l for the terminal constraints (l,cu/fy, (3k), 
with associated twist parameters pk and Ck- In the polynomial case, the 
terminal constraints (l,U(k), (3k) mean that each subproblem is polynomial, 
with Ck = 0. In either case, Theorem A.l gives the form of the extremal 
occupancy curves 7fej(^) for each of these k subproblems. We now show 
that the subproblem curves sum to form the general extremals. 

Theorem A. 11. Suppose that {^kj} are ^ e 'minimizing distributions 
from Lemma A. 6 for feasible constraints (a,cu,j3). Denote the means of 
the minimizing arguments by (3k = J2j3 7r kj; an d let jk,ji x ) ^ e the extremal 
curves from Theorem A.l for the subproblems (l,uj(k),(3k)- Then the curves 

i 

lii x ) = X a k7k,i-k(x(3 k /(3), i = 0, . . . , I, 

k=0 

i 

7/+ (x) = l-^2 li (x) 
i=0 

are extremals which satisfy the constraints (a,uj,(3). 

Proof. We assume without loss of generality that the constraints are 
irreducible. Otherwise, each irreducible subproblem may be treated sepa- 
rately. 

We extend the definition of 'ji(x) for i > L by using the extended defi- 
nition of jk,j(x) used in the proof of Theorem A.l [see (A. 13)]. Also, for 
convenience, define Jk,j( x ) = lk,j( x (3k/ (3) and likewise, define tp k j- The i/>i 
inherit from the 7j the relation 

i 

M x ) = ^2, a ki>k,i-k{ x )- 

k=0 

To see that the ji are valid occupancy curves, note that 

oo oo i K oo 

X ( X ) = HZ! a klk,i-k ( x ) = a k X *?k,j 0) = 1 • 
i=0 i=0k=0 k=0 j=0 

Also, the —ipi{x) are all nonnegative on [0,(3], and 

oo K oo K 

J2~ipi(x) = ^2a k ^2-ip k ,j{ x ) = ^2ai k ((3 k /(3) = I, 

i=0 k=0 j=0 k=0 
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where the last equality follows from the fact that the ir k • satisfy the conser- 
vation constraint (A. 21). Hence, curves ji(x),i < I and 7/+(x) satisfy the 
conditions of Lemma 2.1. 

It is clear that the 7$ satisfy the desired initial conditions since 7^,0 (0) = 1 
and 7fe,j(0) = for j > 0. The terminal conditions are guaranteed by the fact 
that the values jk,j(Pk) =7T lj satisfy the constraints (A. 22). 

In order to establish that the curves satisfy the Euler-Lagrange equations, 
we first show that the rescaled zero-occupancy curve ijj k for the kth sub- 
problem is proportional to the fcth derivative of the overall zero-occupancy 
curve 

First, take the exponential case, and recall that PkPk = pP- The kth zero- 
occupancy curve, after rescaling, is 



I-k 



^ kfi (x) = C fc e-»fc/V0 + ]>> fcii - CkV t (p k f3k)) (l - ^ 



C k e~ px + - C k Vi{p(3)) (l - 

i=o v 1 J 

-px + j2(w k+l -m(p(3)(i-^ 



Here we have used that u>k,i = 7 ^ki = CkPi{f3)W k+ i when i < I — k. The kth 
derivative of tpo = aoV'o * s 



(-i) fe 4 fc) (x) 



a Co 



(A.25) 



a C p k 
otoCop k - 



i=k 
I—k 

3=0 



k+j 



(i-k)\(5 k 

x\ J 



X 



i—k 



0) 



i>kd x )- 



Using Theorem A.l to express jkj i n terms of 4>k,o, we have 



k=0 



(i-k)\ 



x i k , _„-_ fe tf 



dx^^ {x) 
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Similarly, we obtain 



-Ipiix) = - (X k —1p kti -k I — T- 
k=0 " ^ " 

-^a G p*(i-fc)! ( 1} ^° 
so that the extremals satisfy the simple relation 

(A.26) 

which also arose in the case of empty initial conditions. Because the polyno- 
mial portion of tpQ has degree /, the ratio 6i(x)/'ji(x) is given by the constant 
p for i > I, which establishes (as in the proof of Lemma A. 3) that 

l-ipi(x) 

As shown in the proof of Theorem A.l, the relations (A.26) and (A. 27) are 
sufficient to show that the 7, satisfy the Euler-Lagrange equations. 
In the polynomial case, similar computations show that 

(A.28) (-lM\x) = ^^ k ,o(*) 

and that 

(A.29) «,(,) = ± a^-m).*^ 

for i = 0, . . . , /, establishing the restricted set of equations (A. 10). □ 

In order to demonstrate a strong minimum we will need the following. 

Corollary A. 12. For irreducible constraints, the occupancy functions 
defined in Theorem A. 11 satisfy the integrated version of the Euler-Lagrange 
equations: given x\ 6 [0, there are constants Ci,i = 0, — 1 (and also 
i = I in the exponential case) depending on x\ only such that 

i —Mx),o{x)) dx + —^(x>),6(x')) = a 
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for all x' 6 (x±,j3). 

Proof. We may take oq > without loss of generality. By Theorem A.ll, 
the extremals satisfy the usual form of the Euler-Lagrange equations. The 
above indefinite integral is log(— ipQ +1 \x) / i/jq\x)) , which is finite at both 

endpoints, because (— 1)^0 {x) > for x G [0, /3), i < I (and i = I + 1 in 
the exponential case), as shown in Lemma A. 2. The partial derivative with 
respect to 9i also exists for the same reason. □ 

Theorem A. 13. Suppose that 7 is the extremal defined by Theorem A.ll 
for the feasible constraints (a,ui,f3). Then the cost J (7) is the solution to 
the minimization problem (A. 20) subject to constraints (A. 21) and (A. 22). 

Proof. Consider first the exponential case. The infinite sequence of 
functions 7^ defined in the proof of Theorem A.ll are shown in the proof to 
satisfy the conditions of Lemma A. 3. Hence, the cost is 



J(7) = /3 + E 



i=0 
K 00 



k=0 



f>iogi4 fc) (o)i 

k=0 



fc=o i=o 



4 k) (0)e-P 



The fact that V>fc,o(°) = 1 together with (A.25) implies that ipQ k) (x) = ip { Q k) {0) x 
ip k q(x), so that 



Then 



4 fc) (0)e-^ VP J n '°V P J -{-x)ie-P/j\ 



K 
k=0 

By construction, the endpoints J k j(f3) coincide with the optimal argu- 
ments ir k j of the minimization problem. 

The proof of the polynomial case is almost identical, except that we use 
Lemma A. 4 and (A. 28). □ 



A. 8. Extremal curves have globally minimal cost. In this section we 
prove the following theorem: 
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Theorem A. 14 (Strong minimum). Given feasible constraints (a,u,f3), 
let 7 be the corresponding extremal occupancy path defined in Theorem A. 11, 
and let 7 be any other occupancy path satisfying the same constraints. Then 
J{l)<J{l). 

We first introduce some notation. Let O denote the set of valid occupancy 
functions, that is, vector functions 7 such that the cumulative occupancy 
functions tp satisfy the conditions of Lemma 2.1, and let 0(a,uj,f3) = {7 € 
: 7(0) = q, 7(/3) = to} be the subset of valid occupancy functions satisfying 
feasible constraints (a,w,/5). The proof of the following lemma is a straight- 
forward consequence of the convexity of the map (9, 7) — > -D(#||7) and, hence, 
omitted. 

Lemma A. 15. 

(a) 0(a,u),f3) is a convex set. 

(b) ,7(7) restricted to (D(a,uj,f3) is a convex function. 

For a given pair of occupancy functions 7, 7 S 0(a, u, (5) , we denote 7 £ = (1 — e)j + £7 
and define the function G : [0, 1] — » K+ U {00} by 

G[£] = J( 7 e )= f D{9^ X )U(x))dx. 
Jo 

To simplify the notation, we do not explicitly indicate the dependence of G 
on 7 and 7. Lemma A. 15 ensures that G is a well-defined, convex function 
for any 7, 7 G C(a, u, (3). 

For the remainder of this section, we once again restrict attention to irre- 
ducible constraints, without loss of generality. The following lemma estab- 
lishes the minimality of the extremals in an important special case. The proof 
uses the fact that the extremal curve and its derivative can be bounded away 
from zero. 

Lemma A. 16. Suppose that (a,to,/3) are strictly positive, irreducible 
feasible constraints, where the upper indices K and I of a and u satisfy 
K = I + 1 in the exponential case, or K = I in the polynomial case. Suppose 
that 7 is the extremal curve for these constraints, defined in Theorem A. 11, 
and that 7 is any competing occupancy function satisfying the same con- 
straints. Then 

J(j)<J(l). 

Proof. We construct the family of paths Y = (1 — e)j + ej, with G[e] = J{l e ). 
It follows from convexity that G is left and right differentiable wherever it 
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is finite. We will show that G+[0] = 0, where Gu_[e] denotes the right deriva- 
tive of G. The convexity of G then implies the desired result. 

It will be convenient to work with the cumulative occupancy functions ip £ , 
and as in Section A. 3, we will mix notation by writing, for example, J("f) = 

After defining 

rj(x) = i^(x) — tp(x), 

we may write 

rP rP 
(A.30) G[e}= L(i(; + er],9-£fi)dx= g(x,e)dx. 

Jo Jo 

We can assume without loss that 67(1) = J{^) < oo, since there is nothing 
to prove otherwise. We wish to show that differentiation under the integral 
sign with respect to e is valid in a neighborhood of 0. The validity of this 
operation will follow from [23], Corollary 39.2, if we can provide a constant 
bound on the partial derivative of g with respect to e for almost every 
x€[O,0\. 

To construct this bound, we will first establish that the components 7i(x) 
and derivatives 9i(x) = —ipi{x) are uniformly bounded away from zero. For 
specificity, we first assume that (a, to, (3) are exponential constraints. Note 
that in the empty case studied in Section A. 4, jo(x) decreases monoton- 
ically to 7o(/3)=wo and 6o(x) decreases monotonically to 0q{@). Inspec- 
tion of the formula for ipo in Theorem A.l reveals that 9q((3) = lo\/(3 if 
/ > and 0q(P) = CV\{pj3) / (3 otherwise. Hence, for any exponential prob- 
lem with empty initial conditions and positive terminal conditions Ui, 70 
and #0 are uniformly bounded away from zero. For the constraints (a, u,/3) 
under consideration, each of the associated subproblems is an exponential 
empty problem with positive terminal conditions, and so 7^,0(2^) an d &ko( x ) 
are uniformly bounded from zero for each subproblem. These, in turn, lower 
bound the overall 7, and 8i, since, for example, 

i 

k=0 

The catch-all term 774. (x) is monotonically increasing, and hence, satisfies 
li+{ x ) > oti + \ > 0. The identity 9i+(x) = p r yi + (x), established in the proof 
of Theorem A.ll, ensures a similar bound on 9i+(x). 

Note that 7 £ > (1 — e)j and 9 £ > (1 — e)9, so that for each < e < 1, these 
functions are also uniformly bounded from zero on [0, (3] . Indeed, given an 
arbitrary £q < 1, there are positive lower bounds on j £ and 9 £ which hold 
uniformly for x € [0, /?] and e G [0,£q]- To be precise, these inequalities and 
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bounds hold everywhere except possibly on a set of measure zero where 
9 may not exist. 

The partial derivative of the integrand of (A. 30) is 



d . ^ dL 



Os 



(x)r}i(x) -J2 9 



i=0 



d6i 



i=0 " ri 

The partial derivatives of L are (in mixed notation) 
dL 

dipi 
dL 



(x)fji(x). 



7i(x) 7i+i(x)' 



The uniform lower bounds on 7 e and 9 £ , together with upper bounds 7 e < 1, 
9 e < 1 and the boundedness of rj and rj, combine to establish that there is a 
finite B such that 



d I \ 



<B for e G [0,£ ], a.e. x G [0,/3]. 



Similar arguments may be used to establish this bound in the polynomial 
case. In that case, the terms tpi = 1 and 9j = do not play a role, and the 
expression for the partial derivative of L with respect to 9{ simplifies. 
We are now free to differentiate under the integral sign so that 

rP 8 

-g(x, e) dx 



G'[e] 

for all eg [0,£o). We note that 
dL 



de 



dipi 



dx' 



_i _|_ i+1 

7i 7i+i 



dx' 



is absolutely continuous as a function of x, since it is the difference of two 
continuous monotone functions with bounded derivatives. Applying integra- 
tion by parts for absolutely continuous functions ([23], 36.1, page 209) to 
the first term, we obtain 



G' + [0] 



f](x) 



Jo^p ix ' )dX '-^) {x) ) dX 



& 1 
^2Cirji{x)dx 

i=0 



^c i (7 ?i (o)-7 ?i ( ) a)) = o. 



i=0 
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The constants Cj appearing in the third equality come from the integrated 
version of the Euler-Lagrange equations (Corollary A. 12), while the last 
equality is due to the fact that 7 and 7 have the same beginning and end 
points. □ 

We now extend the lemma to show that the extremal is a global minimizer 
even when some of the initial and terminal points are zero. Without loss of 
generality we suppose that ao > and note that the extremals defined in 
Theorem A.l and in Theorem A. 11 are strictly positive except possibly at 
the initial and final times t = and t = (3. 

Let 7 be this extremal and suppose that 7 is an alternate occupancy 
function. It may be supposed that 7 also lies on the boundary only at t = 
or t = (3, since if there is a 7 with lower cost than 7, the convexity of J implies 
that there is another occupancy path of the form A7 + (1 — A)7 which also 
has lower cost than 7, and which avoids the boundary everywhere that 7 
does. 

Given j and x^ f (3, let 7^ be the extremal curve with initial 
point 7(xj"^) and terminal point ^(x^). By Lemma A. 16, 

(«) 

J(7 (n) )< [** D{9{x)\mx))dx. 

It then follows that 

J(j) = J(a,u,P) 



< liminf ^(^(x^), ^(x^), x^ — x^) 
= \ini\nfJ(^ n) ) 

(n) 



<lim [** D(Q(x)\\*((x))dx 



= J(7) 

with the first two equalities following from Theorem A. 13 and the definition 
of J~(a,uj, f3), the first inequality following from Corollary A. 7 and the last 
equality from the monotone convergence theorem. 
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